[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2019-10-14 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951076#comment-16951076
 ] 

Steve Loughran commented on HADOOP-13223:
-

You need to upgrade to version of snappy to deal with new platforms e.g. arm-64.

Pure NIO would be best. It would also be much better in testing, where it is 
near impossible to get that native library on the CP.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>Priority: Major
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2019-10-14 Thread john lilley (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951041#comment-16951041
 ] 

john lilley commented on HADOOP-13223:
--

I've recently started working with the snappy compressor, and many Hadoop 
libraries rely on it as well.  It is impressive that a user of the library 
doesn't need to know anything about the native code – the dll/so is cached to 
temp disk upon first use and loaded.  If native calls are still necessary to 
achieve linux FS emulation (instead of the NIO ACL interface) it would be 
better to emulate this approach as it would eliminate questions of version and 
compatibility.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>Priority: Major
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2019-02-16 Thread john lilley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770156#comment-16770156
 ] 

john lilley commented on HADOOP-13223:
--

One more comment. We hit this issue at a customer site, and it took a while to 
diagnose.  Winutils.exe depends on msvcr110.dll (the Visual C++ 2012 redist). 
This was once so common that we never had any issue – it always just happened 
to be installed on the system.  But fast-forward a few years and VC++ 2012 may 
no longer the common redist it once was, so we anticipate needing to install 
this as part of our solution. Also we've moved on from VC++ 2012 a while ago 
too, so our app no longer includes it as a matter of course.  

I do not recommend moving this to a DLL, because as many commenters have 
pointed out, many of the same issues exist there as well. Rather, use the 
Windows ACL support built into Java NIO.  See
[https://docs.oracle.com/javase/7/docs/api/java/nio/file/attribute/AclFileAttributeView.html]
[https://stackoverflow.com/questions/664432/how-do-i-programmatically-change-file-permissions]
Not that this is simple, but neither is winutils C code.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>Priority: Major
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321328#comment-15321328
 ] 

Colin Patrick McCabe commented on HADOOP-13223:
---

Thanks for the explanation, [~cnauroth].  Migrating functionality to the DLL 
seems like a good idea long-term for a lot of reasons.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-07 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319513#comment-15319513
 ] 

Chris Nauroth commented on HADOOP-13223:


bq. It's not clear to me why a DLL would be less prone to path problems than an 
EXE.  It seems like we should just be putting a version number on the EXE, so 
that we avoid these conflicts. We have the same problem with libhadoop-- see 
HADOOP-11127.

You're correct that hadoop.dll suffers the same challenges as libhadoop.so, but 
on Windows, the challenges are even greater.

First, there is the simple matter that there are 2 binaries to grapple with 
instead of 1.  Both hadoop.dll and winutils.exe are required.

Second, there is the problem of understanding how a Hadoop process loads 
winutils.exe.  In the case of hadoop.dll, the running process uses well-known, 
well-defined dynamic linking mechanisms to find the dll.  Experienced Windows 
developers and admins will be familiar with the [DLL search 
path|https://msdn.microsoft.com/en-us/library/windows/desktop/ms682586(v=vs.85).aspx].
  For winutils.exe, there is no such familiar ground on which developers and 
admins can build an understanding.  Hadoop uses separate, arbitrary logic to 
find the winutils.exe binary.  Unlike the DLL search path, this is not 
consistent with typical dynamic linking practices, so it can be a source of 
confusion.

I am +1 for migrating more functionality into hadoop.dll and eventually 
eliminating winutils.exe.  This addresses the additional difficulty of 
coordinating 2 binaries and the additional difficulty of understanding how it 
gets loaded.  It does not address the challenge of version compatibility 
between Java and native code during dynamic linking, but that issue is tracked 
elsewhere.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving 

[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-07 Thread john lilley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319400#comment-15319400
 ] 

john lilley commented on HADOOP-13223:
--

[~cmccabe], Looking at the various issues we've encountered, I agree that most 
of them can be addressed with keeping winutils.exe and doing these things:
1: Taking steps to ensure that winutils.exe is always available on client 
library downloads IN A CONSISTENT PLACE
2: #1 can be made automatic by bundling winutils.exe into the 
RawLocalFileSystem jar (or perhaps NativeIO?) and caching it to a temporary 
place before invoking it.
3: Removing HADOOP_HOME, hadoop.home.dir, and PATH as alternate ways of finding 
winutils.exe.  If #2 is done, this should always yield a full path to exactly 
the winutils.exe that we want.
4: Hiding all access to winutils under a consistent API (in RawLocalFileSystem 
or NativeIO) for performing file operations (chown, chmod, symlink, readlink, 
etc).  This means removing or privatizing almost everything in the Shell class, 
but especially the following: Shell.getWinUtilsPath(), Shell.WINUTILS, 
Shell.get*Command().

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319318#comment-15319318
 ] 

Colin Patrick McCabe commented on HADOOP-13223:
---

Hmm.  It's not clear to me why a DLL would be less prone to path problems than 
an EXE.  It seems like we should just be putting a version number on the EXE, 
so that we avoid these conflicts.  We have the same problem with libhadoop-- 
see HADOOP-11127.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-01 Thread john lilley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310888#comment-15310888
 ] 

john lilley commented on HADOOP-13223:
--

[~chliu], I completely understand how this came about.  Good engineering 
decisions often do not withstand the test of time.  I hope we agree that the 
shell-command-callout and winutils.exe in particular is a less-than-ideal 
solution that should be replaced, although there may be obstacles to doing so.  
I don't know what shell callouts are actually performed throughout Hadoop (that 
is part of the problem, the inability to analyze external code dependencies), 
but if a solid API replacement were made and the use of shell callouts 
deprecated, this would at least establish a path to eventual removal of 
winutils.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-01 Thread Chuan Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310835#comment-15310835
 ] 

Chuan Liu commented on HADOOP-13223:


I can add some context here. Back then:
# Native library inclusion is *optional* in Hadoop on both Windows and Linux.
# Accessing Linux cluster from Windows client is not supported.
# There is no ASF build of Hadoop on Windows (due to Apache had no Windows CI 
machine).

All native and Java API gaps are addressed in calling to external commands due 
to 1). The original Hadoop on Windows promise is that Windows implementation 
should never break Linux side. We agreed to create "winutils" to address 
missing command line utilities on Windows when porting Hadoop to Windows. 
Later, when fixing some other file IO issues, we make the native library a 
mandate on Windows, but the existing "winutils.exe" is not replaced with JNI 
calls due to the amount of engineering work involved.

I read through your complaints, I think problem 1 & 2 are a distro issue and if 
Apache has an official build that include Windows binaries that can help with 
the problem. [~ste...@apache.org] also provides a partial solution. Problem 3 - 
6 can be summarized as poor error messages when calling external commands on 
Windows. So I agree with that and there are various places such error message 
should be improved. However, I do not think removing "winutils.exe" is a fix to 
all your problem. It will likely just replace ".exe" problem with ".dll" 
problem.

That said, personally, I am also in favor to get rid of the "winutils.exe" and 
replace necessary calls with the JNI implementation. "winutils" code is 
designed to have all main implementations in "libwintuils", so both command 
line implemetation "winutils.exe" and JNI implementation "hadoop.dll" are 
surface level wrappers that statically link to the same underlying library. On 
this front, it should not be too difficult to move all "winutils.exe" 
implemetations into JNI calls.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 

[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310735#comment-15310735
 ] 

Chris Nauroth commented on HADOOP-13223:


bq. I think that moving functions to a DLL could improve matters if the DLL was 
embedded in the jar itself as a resource.

There is a JIRA tracking this: HADOOP-11127.  There is a lot of discussion 
about the trade-offs there, though there is not yet consensus on how to proceed.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-01 Thread john lilley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310519#comment-15310519
 ] 

john lilley commented on HADOOP-13223:
--

the other bug which showed is reported by my colleague Nathan:

In org.apache.hadoop.util, there's a function, getWinUtilsPath, that looks like 
this:
 public static final String getWinUtilsPath() {
String winUtilsPath = null;
try {
  if (WINDOWS) {
winUtilsPath = getQualifiedBinPath("winutils.exe");
  }
} catch (IOException ioe) {
   LOG.error("Failed to locate the winutils binary in the hadoop binary 
path",
 ioe);
}
   return winUtilsPath;
  }

Unfortunately, if HADOOP_HOME is set but is bogus, it returns a null and you 
get the ambiguous "null exception" thrown.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-01 Thread john lilley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310216#comment-15310216
 ] 

john lilley commented on HADOOP-13223:
--

Steve,
I think that moving functions to a DLL could improve matters if the DLL was 
embedded in the jar itself as a resource.  This is apparently not magic -- the 
DLL still must be extracted to disk and loaded -- but at least you can do this 
in a way that is free of PATH issues: 
http://stackoverflow.com/questions/1611357/how-to-make-a-jar-file-that-includes-dll-files
Of course, you get different issues about needing a valid temp space and 
possible collisions and race conditions.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there (edit: even NativeIO probably doesn't cover the operations 
> that winutils.exe is used for).  Rather than building a DLL that makes native 
> OS calls, the creators of winutils.exe must have decided that it would be 
> more expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-01 Thread john lilley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310203#comment-15310203
 ] 

john lilley commented on HADOOP-13223:
--

I think that the problem is something more than winutils.exe... NativeIO only 
goes so far in replacing shell commands like chmod and chown, and that's really 
the heart of the problem.  I think I was taught in college that shelling out to 
external commands was a bad idea, and well, now we can see why.  I think that a 
root resolution of the underlying issues would mean a search-and-replace 
mission of shell commands with calls into an enhanced NativeIO that also covers 
chown.  But of course that in and of itself speaks highly of the nature of the 
problem.  There is no _interface_ to these operations, so it is a blind spot of 
code quality and ability to refactor or analyze.

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there.  Rather than building a DLL that makes native OS calls, 
> the creators of winutils.exe must have decided that it would be more 
> expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310197#comment-15310197
 ] 

Steve Loughran commented on HADOOP-13223:
-

we're still trying to stabilise 2.8.0 features; I'm one of the people holding 
things up with my S3a work.

I can do a quick build of it for you if you want, just so you can see how the 
failure handling has improved. You don't want to suffer the pain of building a 
windows release setup if you can avoid it.

As you note, all the winutil operations are being done in a windows binary. By 
inference, they can all be done in a DLL. I don't think it will make the 
problems go away, but it could, possibly, lessen the pain. We've looked at 
moving the whole of RawLocalFileSystem to nio; nobody has done it, and we 
suspect a couple of things won't be there, but again, it can only be a good 
thing. I also suspect the missing bits will be related to: permissions and 
symlinks.

ps, don't apologise for the tone, it's a pretty reasonable summary of the 
experience

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there.  Rather than building a DLL that makes native OS calls, 
> the creators of winutils.exe must have decided that it would be more 
> expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.

2016-06-01 Thread john lilley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310196#comment-15310196
 ] 

john lilley commented on HADOOP-13223:
--

OK, I made the title a bit less rude ;-)

> winutils.exe is a bug nexus and should be killed with an axe.
> -
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: bin
>Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
>Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to 
> "work" on Windows platforms, because the NativeIO libraries aren't 
> implemented there.  Rather than building a DLL that makes native OS calls, 
> the creators of winutils.exe must have decided that it would be more 
> expedient to create an EXE to carry out file system operations in a 
> linux-like fashion.  Unfortunately, like many stopgap measures in software, 
> this one has persisted well beyond its expected lifetime and usefulness.  My 
> team creates software that runs on Windows and Linux, and winutils.exe is 
> probably responsible for 20% of all issues we encounter, both during 
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular 
> distros and/or the client-side software installation for said distros, when 
> supplied, fails to install winutils.exe.  Thus, as software developers, we 
> are forced to pick one version and distribute and install it with our 
> software.
> Which leads to problem #2: winutils.exe are not always compatible.  In 
> particular, MapR MUST have its winutils.exe in the system path, but doing so 
> breaks the Hadoop distro for every other Hadoop vendor.  This makes creating 
> and maintaining test environments that work with all of the Hadoop distros we 
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client 
> software where to find winutils.exe is poorly documented and fragile.  First, 
> it can be in the PATH.  If it is in the PATH, that is where it is found.  
> However, the documentation, such as it is, makes no mention of this, and 
> instead says that you should set the HADOOP_HOME environment variable, which 
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe 
> was actually found and loaded.  Because of this, fixing problems of finding 
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS 
> and YARN, one does not *need* winutils.exe.  But if it is missing, the log 
> messages complain about its absence.  When we are trying to diagnose an 
> obscure issue in Hadoop (of which there are many), the presence of this red 
> herring leads to all sorts of time wasted until someone on the team points 
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe 
> are not helpful.  The Java stack trace ends at the ProcessBuilder call.  Only 
> through bitter experience is one able to connect the dots from 
> "ProcessBuilder is the last thing on the stack" to "something is wrong with 
> winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only 
> encountered when using Hadoop client libraries to access a cluster from 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org