[jira] [Commented] (HADOOP-7489) Hadoop logs errors upon startup on OS X 10.7
[ https://issues.apache.org/jira/browse/HADOOP-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271034#comment-13271034 ] Allen Wittenauer commented on HADOOP-7489: -- Java has its own internal version of Kerberos. That version is very, very stupid in 1.6 and earlier when it comes to using naming services for auto-discovery of the realm and KDC information. You'll see similar weirdness even on non-OS X boxes when the krb5.conf doesn't explicitly list the realm information. The same configuration fix mentioned here applies there as well. This has been fixed in JRE 1.7. Allegedly. Hadoop logs errors upon startup on OS X 10.7 Key: HADOOP-7489 URL: https://issues.apache.org/jira/browse/HADOOP-7489 Project: Hadoop Common Issue Type: Bug Environment: Mac OS X 10.7, Java 1.6.0_26 Reporter: Bryan Keller Priority: Minor When starting Hadoop on OS X 10.7 (Lion) using start-all.sh, Hadoop logs the following errors: 2011-07-28 11:45:31.469 java[77427:1a03] Unable to load realm info from SCDynamicStore Hadoop does seem to function properly despite this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7489) Hadoop logs errors upon startup on OS X 10.7
[ https://issues.apache.org/jira/browse/HADOOP-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-7489. -- Resolution: Won't Fix I'm closing this as Won't Fix since this is a known JRE bug and not particularly anything we can do about it in Hadoop. Hadoop logs errors upon startup on OS X 10.7 Key: HADOOP-7489 URL: https://issues.apache.org/jira/browse/HADOOP-7489 Project: Hadoop Common Issue Type: Bug Environment: Mac OS X 10.7, Java 1.6.0_26 Reporter: Bryan Keller Priority: Minor When starting Hadoop on OS X 10.7 (Lion) using start-all.sh, Hadoop logs the following errors: 2011-07-28 11:45:31.469 java[77427:1a03] Unable to load realm info from SCDynamicStore Hadoop does seem to function properly despite this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8368) Use CMake rather than autotools to build native code
[ https://issues.apache.org/jira/browse/HADOOP-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-8368: - Hadoop Flags: Incompatible change Marking this as an incompatible change since it breaks the building of the native code on platforms where it currently worked out of the box. Use CMake rather than autotools to build native code Key: HADOOP-8368 URL: https://issues.apache.org/jira/browse/HADOOP-8368 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8368.001.patch, HADOOP-8368.005.patch, HADOOP-8368.006.patch, HADOOP-8368.007.patch, HADOOP-8368.008.patch, HADOOP-8368.009.patch, HADOOP-8368.010.patch, HADOOP-8368.012.half.patch, HADOOP-8368.012.patch, HADOOP-8368.012.rm.patch, HADOOP-8368.014.trimmed.patch, HADOOP-8368.015.trimmed.patch, HADOOP-8368.016.trimmed.patch, HADOOP-8368.018.trimmed.patch, HADOOP-8368.020.rm.patch, HADOOP-8368.020.trimmed.patch It would be good to use cmake rather than autotools to build the native (C/C++) code in Hadoop. Rationale: 1. automake depends on shell scripts, which often have problems running on different operating systems. It would be extremely difficult, and perhaps impossible, to use autotools under Windows. Even if it were possible, it might require horrible workarounds like installing cygwin. Even on Linux variants like Ubuntu 12.04, there are major build issues because /bin/sh is the Dash shell, rather than the Bash shell as it is in other Linux versions. It is currently impossible to build the native code under Ubuntu 12.04 because of this problem. CMake has robust cross-platform support, including Windows. It does not use shell scripts. 2. automake error messages are very confusing. For example, autoreconf: cannot empty /tmp/ar0.4849: Is a directory or Can't locate object method path via package Autom4te... are common error messages. In order to even start debugging automake problems you need to learn shell, m4, sed, and the a bunch of other things. With CMake, all you have to learn is the syntax of CMakeLists.txt, which is simple. CMake can do all the stuff autotools can, such as making sure that required libraries are installed. There is a Maven plugin for CMake as well. 3. Different versions of autotools can have very different behaviors. For example, the version installed under openSUSE defaults to putting libraries in /usr/local/lib64, whereas the version shipped with Ubuntu 11.04 defaults to installing the same libraries under /usr/local/lib. (This is why the FUSE build is currently broken when using OpenSUSE.) This is another source of build failures and complexity. If things go wrong, you will often get an error message which is incomprehensible to normal humans (see point #2). CMake allows you to specify the minimum_required_version of CMake that a particular CMakeLists.txt will accept. In addition, CMake maintains strict backwards compatibility between different versions. This prevents build bugs due to version skew. 4. autoconf, automake, and libtool are large and rather slow. This adds to build time. For all these reasons, I think we should switch to CMake for compiling native (C/C++) code in Hadoop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8368) Use CMake rather than autotools to build native code
[ https://issues.apache.org/jira/browse/HADOOP-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283682#comment-13283682 ] Allen Wittenauer commented on HADOOP-8368: -- Other platforms now require cmake to be installed whereas before they didn't. That's an incompatible change in my book. Use CMake rather than autotools to build native code Key: HADOOP-8368 URL: https://issues.apache.org/jira/browse/HADOOP-8368 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8368.001.patch, HADOOP-8368.005.patch, HADOOP-8368.006.patch, HADOOP-8368.007.patch, HADOOP-8368.008.patch, HADOOP-8368.009.patch, HADOOP-8368.010.patch, HADOOP-8368.012.half.patch, HADOOP-8368.012.patch, HADOOP-8368.012.rm.patch, HADOOP-8368.014.trimmed.patch, HADOOP-8368.015.trimmed.patch, HADOOP-8368.016.trimmed.patch, HADOOP-8368.018.trimmed.patch, HADOOP-8368.020.rm.patch, HADOOP-8368.020.trimmed.patch It would be good to use cmake rather than autotools to build the native (C/C++) code in Hadoop. Rationale: 1. automake depends on shell scripts, which often have problems running on different operating systems. It would be extremely difficult, and perhaps impossible, to use autotools under Windows. Even if it were possible, it might require horrible workarounds like installing cygwin. Even on Linux variants like Ubuntu 12.04, there are major build issues because /bin/sh is the Dash shell, rather than the Bash shell as it is in other Linux versions. It is currently impossible to build the native code under Ubuntu 12.04 because of this problem. CMake has robust cross-platform support, including Windows. It does not use shell scripts. 2. automake error messages are very confusing. For example, autoreconf: cannot empty /tmp/ar0.4849: Is a directory or Can't locate object method path via package Autom4te... are common error messages. In order to even start debugging automake problems you need to learn shell, m4, sed, and the a bunch of other things. With CMake, all you have to learn is the syntax of CMakeLists.txt, which is simple. CMake can do all the stuff autotools can, such as making sure that required libraries are installed. There is a Maven plugin for CMake as well. 3. Different versions of autotools can have very different behaviors. For example, the version installed under openSUSE defaults to putting libraries in /usr/local/lib64, whereas the version shipped with Ubuntu 11.04 defaults to installing the same libraries under /usr/local/lib. (This is why the FUSE build is currently broken when using OpenSUSE.) This is another source of build failures and complexity. If things go wrong, you will often get an error message which is incomprehensible to normal humans (see point #2). CMake allows you to specify the minimum_required_version of CMake that a particular CMakeLists.txt will accept. In addition, CMake maintains strict backwards compatibility between different versions. This prevents build bugs due to version skew. 4. autoconf, automake, and libtool are large and rather slow. This adds to build time. For all these reasons, I think we should switch to CMake for compiling native (C/C++) code in Hadoop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8368) Use CMake rather than autotools to build native code
[ https://issues.apache.org/jira/browse/HADOOP-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-8368: - Hadoop Flags: Incompatible change Use CMake rather than autotools to build native code Key: HADOOP-8368 URL: https://issues.apache.org/jira/browse/HADOOP-8368 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8368.001.patch, HADOOP-8368.005.patch, HADOOP-8368.006.patch, HADOOP-8368.007.patch, HADOOP-8368.008.patch, HADOOP-8368.009.patch, HADOOP-8368.010.patch, HADOOP-8368.012.half.patch, HADOOP-8368.012.patch, HADOOP-8368.012.rm.patch, HADOOP-8368.014.trimmed.patch, HADOOP-8368.015.trimmed.patch, HADOOP-8368.016.trimmed.patch, HADOOP-8368.018.trimmed.patch, HADOOP-8368.020.rm.patch, HADOOP-8368.020.trimmed.patch It would be good to use cmake rather than autotools to build the native (C/C++) code in Hadoop. Rationale: 1. automake depends on shell scripts, which often have problems running on different operating systems. It would be extremely difficult, and perhaps impossible, to use autotools under Windows. Even if it were possible, it might require horrible workarounds like installing cygwin. Even on Linux variants like Ubuntu 12.04, there are major build issues because /bin/sh is the Dash shell, rather than the Bash shell as it is in other Linux versions. It is currently impossible to build the native code under Ubuntu 12.04 because of this problem. CMake has robust cross-platform support, including Windows. It does not use shell scripts. 2. automake error messages are very confusing. For example, autoreconf: cannot empty /tmp/ar0.4849: Is a directory or Can't locate object method path via package Autom4te... are common error messages. In order to even start debugging automake problems you need to learn shell, m4, sed, and the a bunch of other things. With CMake, all you have to learn is the syntax of CMakeLists.txt, which is simple. CMake can do all the stuff autotools can, such as making sure that required libraries are installed. There is a Maven plugin for CMake as well. 3. Different versions of autotools can have very different behaviors. For example, the version installed under openSUSE defaults to putting libraries in /usr/local/lib64, whereas the version shipped with Ubuntu 11.04 defaults to installing the same libraries under /usr/local/lib. (This is why the FUSE build is currently broken when using OpenSUSE.) This is another source of build failures and complexity. If things go wrong, you will often get an error message which is incomprehensible to normal humans (see point #2). CMake allows you to specify the minimum_required_version of CMake that a particular CMakeLists.txt will accept. In addition, CMake maintains strict backwards compatibility between different versions. This prevents build bugs due to version skew. 4. autoconf, automake, and libtool are large and rather slow. This adds to build time. For all these reasons, I think we should switch to CMake for compiling native (C/C++) code in Hadoop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-7147) setnetgrent in native code is not portable
[ https://issues.apache.org/jira/browse/HADOOP-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-7147: - Attachment: HADOOP-7147.patch setnetgrent in native code is not portable -- Key: HADOOP-7147 URL: https://issues.apache.org/jira/browse/HADOOP-7147 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Allen Wittenauer Attachments: HADOOP-7147.patch, hadoop-7147.patch HADOOP-6864 uses the setnetgrent function in a way which is not compatible with BSD APIs, where the call returns void rather than int. This prevents the native libs from building on OSX, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-7824) Native IO uses wrong constants almost everywhere
[ https://issues.apache.org/jira/browse/HADOOP-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-7824: - Target Version/s: 2.0.0-alpha, 1.1.0, 3.0.0 (was: 2.0.0-alpha, 3.0.0) Affects Version/s: 1.0.3 Summary: Native IO uses wrong constants almost everywhere (was: Native IO uses wrong constants on OS X) Changing this to reflect reality. NativeIO.java's definitions mismatch RHEL 6 as well, which means that even on Linux this is untrustworthy. Native IO uses wrong constants almost everywhere Key: HADOOP-7824 URL: https://issues.apache.org/jira/browse/HADOOP-7824 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.3, 0.23.0 Environment: Mac OS X Reporter: Dmytro Shteflyuk Assignee: Todd Lipcon Labels: hadoop Attachments: HADOOP-7824.patch Constants like O_CREAT, O_EXCL, etc. have different values on OS X. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-7824) Native IO uses wrong constants almost everywhere
[ https://issues.apache.org/jira/browse/HADOOP-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-7824: - Attachment: HADOOP-7824.patch This patch generates a bit field translator based upon what Hadoop thinks is true in NativeIO.java vs. what fcntl.h and friends know is real. This isn't fully cooked into the build process yet, however. Native IO uses wrong constants almost everywhere Key: HADOOP-7824 URL: https://issues.apache.org/jira/browse/HADOOP-7824 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.3, 0.23.0 Environment: Mac OS X Reporter: Dmytro Shteflyuk Assignee: Todd Lipcon Labels: hadoop Attachments: HADOOP-7824.patch, HADOOP-7824.patch Constants like O_CREAT, O_EXCL, etc. have different values on OS X. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-7824) Native IO uses wrong constants almost everywhere
[ https://issues.apache.org/jira/browse/HADOOP-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-7824: - Description: Constants like O_CREAT, O_EXCL, etc. have different values on OS X and many other operating systems. (was: Constants like O_CREAT, O_EXCL, etc. have different values on OS X.) Environment: Mac OS X, Linux, Solaris, ... (was: Mac OS X) Target Version/s: 2.0.0-alpha, 1.1.0, 3.0.0 (was: 1.1.0, 2.0.0-alpha, 3.0.0) Native IO uses wrong constants almost everywhere Key: HADOOP-7824 URL: https://issues.apache.org/jira/browse/HADOOP-7824 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.3, 0.23.0 Environment: Mac OS X, Linux, Solaris, ... Reporter: Dmytro Shteflyuk Assignee: Todd Lipcon Labels: hadoop Attachments: HADOOP-7824.patch, HADOOP-7824.patch Constants like O_CREAT, O_EXCL, etc. have different values on OS X and many other operating systems. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-7824) Native IO uses wrong constants almost everywhere
[ https://issues.apache.org/jira/browse/HADOOP-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-7824: - Environment: Mac OS X, Linux, Solaris, Windows, ... (was: Mac OS X, Linux, Solaris, ... ) Target Version/s: 2.0.0-alpha, 1.1.0, 3.0.0 (was: 1.1.0, 2.0.0-alpha, 3.0.0) Affects Version/s: 3.0.0 2.0.0-alpha Native IO uses wrong constants almost everywhere Key: HADOOP-7824 URL: https://issues.apache.org/jira/browse/HADOOP-7824 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.3, 0.23.0, 2.0.0-alpha, 3.0.0 Environment: Mac OS X, Linux, Solaris, Windows, ... Reporter: Dmytro Shteflyuk Assignee: Todd Lipcon Labels: hadoop Attachments: HADOOP-7824.patch, HADOOP-7824.patch Constants like O_CREAT, O_EXCL, etc. have different values on OS X and many other operating systems. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7824) Native IO uses wrong constants almost everywhere
[ https://issues.apache.org/jira/browse/HADOOP-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402766#comment-13402766 ] Allen Wittenauer commented on HADOOP-7824: -- I suspect that's going to be a lot slower. The C compiler should be able to optimize the heck out of pre-generated code rather than doing it on the fly. Given that the whole reason why this code exists is speed Native IO uses wrong constants almost everywhere Key: HADOOP-7824 URL: https://issues.apache.org/jira/browse/HADOOP-7824 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.3, 0.23.0, 2.0.0-alpha, 3.0.0 Environment: Mac OS X, Linux, Solaris, Windows, ... Reporter: Dmytro Shteflyuk Assignee: Todd Lipcon Labels: hadoop Attachments: HADOOP-7824.patch, HADOOP-7824.patch Constants like O_CREAT, O_EXCL, etc. have different values on OS X and many other operating systems. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7824) Native IO uses wrong constants almost everywhere
[ https://issues.apache.org/jira/browse/HADOOP-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402784#comment-13402784 ] Allen Wittenauer commented on HADOOP-7824: -- (well, that and the fact that Java actively works against you if you want to use OS resources) Native IO uses wrong constants almost everywhere Key: HADOOP-7824 URL: https://issues.apache.org/jira/browse/HADOOP-7824 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.3, 0.23.0, 2.0.0-alpha, 3.0.0 Environment: Mac OS X, Linux, Solaris, Windows, ... Reporter: Dmytro Shteflyuk Assignee: Todd Lipcon Labels: hadoop Attachments: HADOOP-7824.patch, HADOOP-7824.patch Constants like O_CREAT, O_EXCL, etc. have different values on OS X and many other operating systems. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7147) setnetgrent in native code is not portable
[ https://issues.apache.org/jira/browse/HADOOP-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402856#comment-13402856 ] Allen Wittenauer commented on HADOOP-7147: -- It just occurred to me that this latest patch is incomplete anyway. getgrouplist() has weird behavior on OS X. I just forgot to forward port it from what I'm running in 1.x. In any case, this patch is here for those that need it. I'm not really planning on working on it to get it commit-ready. (note this bug is marked as won't fix). setnetgrent in native code is not portable -- Key: HADOOP-7147 URL: https://issues.apache.org/jira/browse/HADOOP-7147 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Allen Wittenauer Attachments: HADOOP-7147.patch, hadoop-7147.patch HADOOP-6864 uses the setnetgrent function in a way which is not compatible with BSD APIs, where the call returns void rather than int. This prevents the native libs from building on OSX, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7147) setnetgrent in native code is not portable
[ https://issues.apache.org/jira/browse/HADOOP-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402872#comment-13402872 ] Allen Wittenauer commented on HADOOP-7147: -- BTW, Colin, see my note in the other bug about using get*_r routines on Solaris. Now that we're in cmake-ville, I haven't figured out how to do OS or compiler detection to do the correct thing on non-gcc/non-Linux. (e.g., on Darwin, we should *really* be passing -framework JavaVM amongst other flags when building libhadoop.dylib so we get linked properly, can build a fat binary, etc.). setnetgrent in native code is not portable -- Key: HADOOP-7147 URL: https://issues.apache.org/jira/browse/HADOOP-7147 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Allen Wittenauer Attachments: HADOOP-7147.patch, hadoop-7147.patch HADOOP-6864 uses the setnetgrent function in a way which is not compatible with BSD APIs, where the call returns void rather than int. This prevents the native libs from building on OSX, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7147) setnetgrent in native code is not portable
[ https://issues.apache.org/jira/browse/HADOOP-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403147#comment-13403147 ] Allen Wittenauer commented on HADOOP-7147: -- I'm thinking about compiler flags, so CHECK_FUNCTION_EXISTS won't work here. (i.e., -xCC, -xc99, -xstrconst, -xparallel, etc). Also need to pop all the GNU specific bits into protected areas so as not to infect everything else. setnetgrent in native code is not portable -- Key: HADOOP-7147 URL: https://issues.apache.org/jira/browse/HADOOP-7147 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 0.22.0, 0.23.0 Reporter: Todd Lipcon Assignee: Allen Wittenauer Attachments: HADOOP-7147.patch, hadoop-7147.patch HADOOP-6864 uses the setnetgrent function in a way which is not compatible with BSD APIs, where the call returns void rather than int. This prevents the native libs from building on OSX, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8737) cmake: always use JAVA_HOME to find libjvm.so, jni.h, jni_md.h
[ https://issues.apache.org/jira/browse/HADOOP-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443615#comment-13443615 ] Allen Wittenauer commented on HADOOP-8737: -- Just a heads up that this completely breaks on OS X and likely other non-Linux systems. cmake: always use JAVA_HOME to find libjvm.so, jni.h, jni_md.h -- Key: HADOOP-8737 URL: https://issues.apache.org/jira/browse/HADOOP-8737 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8737.001.patch We should always use the {{libjvm.so}}, {{jni.h}}, and {{jni_md.h}} under {{JAVA_HOME}}, rather than trying to look for them in system paths. Since we compile with Maven, we know that we'll have a valid {{JAVA_HOME}} at all times. There is no point digging in system paths, and it can lead to host contamination if the user has multiple JVMs installed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8737) cmake: always use JAVA_HOME to find libjvm.so, jni.h, jni_md.h
[ https://issues.apache.org/jira/browse/HADOOP-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443690#comment-13443690 ] Allen Wittenauer commented on HADOOP-8737: -- To be more concrete, those paths don't exist because that isn't where the Java lib directory lives. For systems that support multi-arch/bitness (OS X, Solaris, etc), it will definitely pick the wrong library. cmake: always use JAVA_HOME to find libjvm.so, jni.h, jni_md.h -- Key: HADOOP-8737 URL: https://issues.apache.org/jira/browse/HADOOP-8737 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8737.001.patch We should always use the {{libjvm.so}}, {{jni.h}}, and {{jni_md.h}} under {{JAVA_HOME}}, rather than trying to look for them in system paths. Since we compile with Maven, we know that we'll have a valid {{JAVA_HOME}} at all times. There is no point digging in system paths, and it can lead to host contamination if the user has multiple JVMs installed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8737) cmake: always use JAVA_HOME to find libjvm.so, jni.h, jni_md.h
[ https://issues.apache.org/jira/browse/HADOOP-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443699#comment-13443699 ] Allen Wittenauer commented on HADOOP-8737: -- You'll note that I'm not giving any official vote on it. Commit or not, I don't really care. Just pointing out that this is another portability break in case someone cares. As my patches, I've posted several variations of various parts of my portability patches over the years. I really have no intention at the time of this writing to expend the effort to update them on JIRA again. While patches accepted is the common mantra, patches committed is not. I don't have someone who I sit next to that I can tap on the shoulder and blanket +1 them, especially since portability work is tricky and there are lots of subtle bugs. (For example, the group code in libhadoop on OS X is quite entertaining...) In the case of this one, like a lot of the other patches that blatantly break any attempts at portability, I'll just rip it out and replace with my own code. I already replace other parts of the system with more functional equivalents (like the krb5 filter...) so one more isn't going to hurt. cmake: always use JAVA_HOME to find libjvm.so, jni.h, jni_md.h -- Key: HADOOP-8737 URL: https://issues.apache.org/jira/browse/HADOOP-8737 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HADOOP-8737.001.patch We should always use the {{libjvm.so}}, {{jni.h}}, and {{jni_md.h}} under {{JAVA_HOME}}, rather than trying to look for them in system paths. Since we compile with Maven, we know that we'll have a valid {{JAVA_HOME}} at all times. There is no point digging in system paths, and it can lead to host contamination if the user has multiple JVMs installed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8781) hadoop-config.sh should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH
[ https://issues.apache.org/jira/browse/HADOOP-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453384#comment-13453384 ] Allen Wittenauer commented on HADOOP-8781: -- Be aware that this change will likely have side-effects for non-Java code. hadoop-config.sh should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH Key: HADOOP-8781 URL: https://issues.apache.org/jira/browse/HADOOP-8781 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 2.0.2-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 1.2.0, 2.0.2-alpha Attachments: HADOOP-8781-branch1.patch, HADOOP-8781-branch1.patch, HADOOP-8781.patch, HADOOP-8781.patch Snappy SO fails to load properly if LD_LIBRARY_PATH does not include the path where snappy SO is. This is observed in setups that don't have an independent snappy installation (not installed by Hadoop) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8781) hadoop-config.sh should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH
[ https://issues.apache.org/jira/browse/HADOOP-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453386#comment-13453386 ] Allen Wittenauer commented on HADOOP-8781: -- (and yes, I'd -1 this if anyone outside your hallway was given a chance to review stuff before it got committed.) hadoop-config.sh should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH Key: HADOOP-8781 URL: https://issues.apache.org/jira/browse/HADOOP-8781 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 2.0.2-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 1.2.0, 2.0.2-alpha Attachments: HADOOP-8781-branch1.patch, HADOOP-8781-branch1.patch, HADOOP-8781.patch, HADOOP-8781.patch Snappy SO fails to load properly if LD_LIBRARY_PATH does not include the path where snappy SO is. This is observed in setups that don't have an independent snappy installation (not installed by Hadoop) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455495#comment-13455495 ] Allen Wittenauer commented on HADOOP-8806: -- bq. However, snappy can't be loaded from this directory unless LD_LIBRARY_PATH is set to include this directory Or, IIRC, dlopen will look in the shared libraries run path (-rpath for those using GNU LD, -R for just about everyone else). This is the preferred way to deal with this outside of Java. See also the $ORIGIN 'macro' to make the path dynamic based upon the executable location. There is no reason to really hard-code any paths or set LD_LIBRARY_PATH in modern linkers due to these features unless you are absolutely doing something crazy. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8797) automatically detect JAVA_HOME on Linux, report native lib path similar to class path
[ https://issues.apache.org/jira/browse/HADOOP-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455498#comment-13455498 ] Allen Wittenauer commented on HADOOP-8797: -- bq. iterate common java locations on Linux starting with Java7 down to Java6 FWIW, -1. automatically detect JAVA_HOME on Linux, report native lib path similar to class path - Key: HADOOP-8797 URL: https://issues.apache.org/jira/browse/HADOOP-8797 Project: Hadoop Common Issue Type: Improvement Environment: Linux Reporter: Gera Shegalov Priority: Trivial Attachments: HADOOP-8797.patch Enhancement 1) iterate common java locations on Linux starting with Java7 down to Java6 Enhancement 2) hadoop jnipath to print java.library.path similar to hadoop classpath -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455515#comment-13455515 ] Allen Wittenauer commented on HADOOP-8806: -- That's why $ORIGIN is a way out of this. At install time, build a symlink to a known path to the out-of-the-way location. bq. you cannot link a .a into a .so Sure you can. You can always use ar to pull out the objects and then include them into your own library. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455519#comment-13455519 ] Allen Wittenauer commented on HADOOP-8806: -- (p.s., this is pretty much what the compiler does when you statically link...) libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455527#comment-13455527 ] Allen Wittenauer commented on HADOOP-8806: -- The problem with LD_LIBRARY_PATH is if you are running something not Java, you may accidentally introduce a different/conflicting library than the one the compiled program is expecting. That's going to lead to some very strange errors to the user. The other possibility is that the end user will override LD_LIBRARY_PATH themselves, which puts us back to the original problem. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8806) libhadoop.so: search java.library.path when calling dlopen
[ https://issues.apache.org/jira/browse/HADOOP-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1342#comment-1342 ] Allen Wittenauer commented on HADOOP-8806: -- It's pretty clear that I'm not making my point given the summary, so I'm just going to let it drop and prepare yet another local patch to back this total mess out after it inevitably gets committed. libhadoop.so: search java.library.path when calling dlopen -- Key: HADOOP-8806 URL: https://issues.apache.org/jira/browse/HADOOP-8806 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Priority: Minor libhadoop calls {{dlopen}} to load {{libsnappy.so}} and {{libz.so}}. These libraries can be bundled in the {{$HADOOP_ROOT/lib/native}} directory. For example, the {{-Dbundle.snappy}} build option copies {{libsnappy.so}} to this directory. However, snappy can't be loaded from this directory unless {{LD_LIBRARY_PATH}} is set to include this directory. Should we also search {{java.library.path}} when loading these libraries? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HADOOP-8187) Improve the discovery of the jvm library during the build process
[ https://issues.apache.org/jira/browse/HADOOP-8187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer reopened HADOOP-8187: -- Description of how to improve this is in the linked jiras. Improve the discovery of the jvm library during the build process - Key: HADOOP-8187 URL: https://issues.apache.org/jira/browse/HADOOP-8187 Project: Hadoop Common Issue Type: Improvement Reporter: Devaraj Das Improve the discovery of the jvm library during the build of native libraries/libhdfs/fuse-dfs, etc. A couple of different ways are currently used (discussed in HADOOP-6924). We should clean this part up and also consider builds of native stuff on OSX. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8187) Improve the discovery of the jvm library during the build process
[ https://issues.apache.org/jira/browse/HADOOP-8187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462335#comment-13462335 ] Allen Wittenauer commented on HADOOP-8187: -- bq. it seems like a non-portable, Sun-specific way of inferring the location of the JRE Right. Which is why I said at that point you'd be porting to different JVMs. IBM has a similar property (whose name escape me at the moment). We *should* have a fallback method in case we can't find it, the property doesn't exist, whatever. But my experience at least on a few platforms is that this is significantly more reliable than $JAVA_HOME or doing finds or any of the other weird things we've tried in the past, especially on multi-arch JVMs where multiple libraries are generally present. (The -d param isn't just for decoration, folks...) Windows appears to be another (not surprising) outlier. bq. Are you suggesting that MacOS users recompile the JDK itself? Apple had a special agreement with Sun that resulted in a lot of chaos with regards to file locations and even how to build JNI code in order to fit it into the NeXTSTEP/OpenStep/Darwin mold. The normal rules do not apply and treating it as such usually ends in tears. I haven't had a chance to look at Mountain Lion to see if things are any better/more standard. I'm suspecting not. Improve the discovery of the jvm library during the build process - Key: HADOOP-8187 URL: https://issues.apache.org/jira/browse/HADOOP-8187 Project: Hadoop Common Issue Type: Improvement Reporter: Devaraj Das Improve the discovery of the jvm library during the build of native libraries/libhdfs/fuse-dfs, etc. A couple of different ways are currently used (discussed in HADOOP-6924). We should clean this part up and also consider builds of native stuff on OSX. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7713) dfs -count -q should label output column
[ https://issues.apache.org/jira/browse/HADOOP-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13476693#comment-13476693 ] Allen Wittenauer commented on HADOOP-7713: -- I'd rather save -h for human readable for whenever someone commits code that changes numbers from 3298534883328 to 3TB. Perhaps this should be changed to -header instead? dfs -count -q should label output column Key: HADOOP-7713 URL: https://issues.apache.org/jira/browse/HADOOP-7713 Project: Hadoop Common Issue Type: Improvement Reporter: Nigel Daley Assignee: Jonathan Allen Priority: Trivial Labels: newbie Attachments: HADOOP-7713.patch, HADOOP-7713.patch, HADOOP-7713.patch These commands should label the output columns: {code} hadoop dfs -count dir...dir hadoop dfs -count -q dir...dir {code} Current output of the 2nd command above: {code} % hadoop dfs -count -q /user/foo /tmp none inf 9569 9493 6372553322 hdfs://nn1.bar.com/user/foo none inf 101 2689 209349812906 hdfs://nn1.bar.com/tmp {code} It is not obvious what these columns mean. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8970) Need a different environment variable or configuration that states where local temporary files are stored than hadoop.tmp.dir
[ https://issues.apache.org/jira/browse/HADOOP-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484119#comment-13484119 ] Allen Wittenauer commented on HADOOP-8970: -- A bit of history... hadoop.tmp.dir defaults to /tmp to make it easier to run QA tests and to get something up quickly. On real systems, this should be one of the first things changed. On the user side... I think one of the fundamental problems is that end users see 'hadoop.tmp.dir' and think Hey, I have some temporary files and I'm using Hadoop! This must be the place! I've been thinking more and more about changing hadoop.tmp.dir during task execution to be the same value as mapred.child.tmp, which is what users are supposed to use. The other thing is that hadoop.tmp.dir should just get replaced with hadoop-daemon.tmp.dir so that it's perfectly clear what the intent of this variable actually is. Need a different environment variable or configuration that states where local temporary files are stored than hadoop.tmp.dir - Key: HADOOP-8970 URL: https://issues.apache.org/jira/browse/HADOOP-8970 Project: Hadoop Common Issue Type: Improvement Components: conf Reporter: Robert Justice I'm finding that hadoop.tmp.dir is used for a base directory in configuration of working directories for many other hadoop sub components (mapred, hdfs, hue, etc) and that it directs where the Hadoop client stores some local temporary files, as well as temporary files on HDFS. Users may be dealing with tight space in /tmp. In order to move where job setup files, hive, hue files, etc, are locally stored, they have to create a new directory on HDFS (i.e. /temp) and local directories on another filesystem and make sure permissions are setup properly in HDFS and for the local filesystem across all the nodes across the cluster. I'm wondering if it would be better to have a hadoop.local.tmp.dir that is configurable at the client level to say where local files are kept, and break that out from hadoop.tmp.dir? Know this is a major change, but thought I would suggest it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8456) Support spaces in user names and group names in results returned via winutils
[ https://issues.apache.org/jira/browse/HADOOP-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484182#comment-13484182 ] Allen Wittenauer commented on HADOOP-8456: -- How does Cygwin, MS SUA, etc handle this problem? We should do the same thing on Windows in order maintain compatibility with those systems. Support spaces in user names and group names in results returned via winutils - Key: HADOOP-8456 URL: https://issues.apache.org/jira/browse/HADOOP-8456 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 1-win Reporter: Chuan Liu Assignee: Ivan Mitic Priority: Minor Attachments: HADOOP-8456.branch-1-win.spaces.patch When parsing results returned by ‘ls’, we made implicit assumption that user and group names cannot contain spaces. On Linux, spaces are not allowed in user names and group names. This is not the case for Windows. We need to find a way to fix the problem for Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8456) Support spaces in user names and group names in results returned via winutils
[ https://issues.apache.org/jira/browse/HADOOP-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486234#comment-13486234 ] Allen Wittenauer commented on HADOOP-8456: -- bq. Generally, we want to natively integrate Hadoop with Windows, in a sense that Windows user/group information flows thru Hadoop. bq. Let me know if this makes sense. It does, but I'm just trying to get a measure for what the user expectation/experience is going to be for those that are going to use this interface. (Which, I suspect, is what Daryn's questions are about as well). 'ls' is a UNIXy thing, which why I asked specifically about cygwin and SUA. If Windows users are expecting different output, it may be worthwhile to implement a dir subcommand. In other words, we should implement 'expected behavior' not necessarily what is convenient if there is a collision. (But convenience can always be provided via an option...) So if cygwin/sua show with spaces by default, that's what we should do. If they provide an option that puts separators in place, that's what we should. Consistency is key for (inter)operability. Support spaces in user names and group names in results returned via winutils - Key: HADOOP-8456 URL: https://issues.apache.org/jira/browse/HADOOP-8456 Project: Hadoop Common Issue Type: Bug Components: native Affects Versions: 1-win Reporter: Chuan Liu Assignee: Ivan Mitic Priority: Minor Attachments: HADOOP-8456.branch-1-win.spaces.patch When parsing results returned by ‘ls’, we made implicit assumption that user and group names cannot contain spaces. On Linux, spaces are not allowed in user names and group names. This is not the case for Windows. We need to find a way to fix the problem for Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9022) Hadoop distcp tool fails to copy file if -m 0 specified
[ https://issues.apache.org/jira/browse/HADOOP-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494685#comment-13494685 ] Allen Wittenauer commented on HADOOP-9022: -- When -m 0 is specified, it should probably just call copy in a serial fashion across the list and not run a MapReduce job. After all, setting the number of maps to zero implies that one doesn't want a job executed at all. Hadoop distcp tool fails to copy file if -m 0 specified --- Key: HADOOP-9022 URL: https://issues.apache.org/jira/browse/HADOOP-9022 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.23.1, 0.23.3, 0.23.4 Reporter: Haiyang Jiang When trying to copy file using distcp on H23, if -m 0 is specified, distcp will just spawn 0 mapper tasks and the file will not be copied. But this used to work before H23, even when -m 0 specified, distcp will always copy the files. Checked the code of DistCp.java Before the rewrite, it set the number maps at least to 1 job.setNumMapTasks(Math.max(numMaps, 1)); But in the newest code, it just takes the input from user: job.getConfiguration().set(JobContext.NUM_MAPS, String.valueOf(inputOptions.getMaxMaps())); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9019) KerberosAuthenticator.doSpnegoSequence(..) should create a HTTP principal with hostname everytime
[ https://issues.apache.org/jira/browse/HADOOP-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494690#comment-13494690 ] Allen Wittenauer commented on HADOOP-9019: -- I seem to recall that using IP addresses in principals was a big no-no since many clients will do a reverse lookup as part of the validation sequence. (This is why one of the most effective ways to break Kerberos is via DNS MITM attacks.) In other words, using FQDN here is more of a Kerberos thing than a Hadoop thing. KerberosAuthenticator.doSpnegoSequence(..) should create a HTTP principal with hostname everytime -- Key: HADOOP-9019 URL: https://issues.apache.org/jira/browse/HADOOP-9019 Project: Hadoop Common Issue Type: Bug Reporter: Vinay in KerberosAuthenticator.doSpnegoSequence(..) following line of code will just create a principal of the form HTTP/host, {code}String servicePrincipal = KerberosUtil.getServicePrincipal(HTTP, KerberosAuthenticator.this.url.getHost());{code} but uri.getHost() is not sure of always getting hostname. If uri contains IP, then it just returns IP. For SPNEGO authentication principal should always be created with hostname. This code should be something like this, which will look /etc/hosts to get hostname {code}String hostname = InetAddress.getByName( KerberosAuthenticator.this.url.getHost()).getHostName(); String servicePrincipal = KerberosUtil.getServicePrincipal(HTTP, hostname);{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9019) KerberosAuthenticator.doSpnegoSequence(..) should create a HTTP principal with hostname everytime
[ https://issues.apache.org/jira/browse/HADOOP-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13498206#comment-13498206 ] Allen Wittenauer commented on HADOOP-9019: -- Can't you just use the _HOST macro instead? (or is that only in my branch? I can't remember what is supposed by the Apache version anymore...) KerberosAuthenticator.doSpnegoSequence(..) should create a HTTP principal with hostname everytime -- Key: HADOOP-9019 URL: https://issues.apache.org/jira/browse/HADOOP-9019 Project: Hadoop Common Issue Type: Bug Reporter: Vinay in KerberosAuthenticator.doSpnegoSequence(..) following line of code will just create a principal of the form HTTP/host, {code}String servicePrincipal = KerberosUtil.getServicePrincipal(HTTP, KerberosAuthenticator.this.url.getHost());{code} but uri.getHost() is not sure of always getting hostname. If uri contains IP, then it just returns IP. For SPNEGO authentication principal should always be created with hostname. This code should be something like this, which will look /etc/hosts to get hostname {code}String hostname = InetAddress.getByName( KerberosAuthenticator.this.url.getHost()).getHostName(); String servicePrincipal = KerberosUtil.getServicePrincipal(HTTP, hostname);{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9054) Add AuthenticationHandler that uses Kerberos but allows for an alternate form of authentication for browsers
[ https://issues.apache.org/jira/browse/HADOOP-9054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499512#comment-13499512 ] Allen Wittenauer commented on HADOOP-9054: -- On the Hadoop side, the authentication handler is already plug-able. So it looks like you want to make them chain-able? Also keep in mind that on the Hadoop-side, the user is going to get bounced around a lot. As a result, this means they might need to re-authenticate on every host if one isn't careful about the implementation. Add AuthenticationHandler that uses Kerberos but allows for an alternate form of authentication for browsers Key: HADOOP-9054 URL: https://issues.apache.org/jira/browse/HADOOP-9054 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 1.2.0, 2.0.3-alpha It would be useful for some Oozie users if, when using Kerberos, that browser access to the oozie web UI could be authenticated in a different way (w/o Kerberos). This may be useful for other projects using Hadoop-Auth, so this feature is to add a new AuthenticationHandler that uses Kerberos by default, unless a browser (user-agents are configurable) is used, in which case some other form of authentication can be used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9054) Add AuthenticationHandler that uses Kerberos but allows for an alternate form of authentication for browsers
[ https://issues.apache.org/jira/browse/HADOOP-9054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500456#comment-13500456 ] Allen Wittenauer commented on HADOOP-9054: -- Maybe this is dumb question but... why does Apache Hadoop need to provide this code, given that this is already pluggable allowing users to provide their own bits now? In other words, if they already have this need to provide two (essentially) incompatible mechanisms what prevents them? Add AuthenticationHandler that uses Kerberos but allows for an alternate form of authentication for browsers Key: HADOOP-9054 URL: https://issues.apache.org/jira/browse/HADOOP-9054 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 1.2.0, 2.0.3-alpha It would be useful for some Oozie users if, when using Kerberos, that browser access to the oozie web UI could be authenticated in a different way (w/o Kerberos). This may be useful for other projects using Hadoop-Auth, so this feature is to add a new AuthenticationHandler that uses Kerberos by default, unless a browser (user-agents are configurable) is used, in which case some other form of authentication can be used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature
[ https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505080#comment-13505080 ] Allen Wittenauer commented on HADOOP-8989: -- Has anyone studied the impact on the NN yet? hadoop dfs -find feature Key: HADOOP-8989 URL: https://issues.apache.org/jira/browse/HADOOP-8989 Project: Hadoop Common Issue Type: New Feature Reporter: Marco Nicosia Assignee: Jonathan Allen Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch Both sysadmins and users make frequent use of the unix 'find' command, but Hadoop has no correlate. Without this, users are writing scripts which make heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs -lsr is somewhat taxing on the NameNode, and a really slow experience on the client side. Possibly an in-NameNode find operation would be only a bit more taxing on the NameNode, but significantly faster from the client's point of view? The minimum set of options I can think of which would make a Hadoop find command generally useful is (in priority order): * -type (file or directory, for now) * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments) * -print0 (for piping to xargs -0) * -depth * -owner/-group (and -nouser/-nogroup) * -name (allowing for shell pattern, or even regex?) * -perm * -size One possible special case, but could possibly be really cool if it ran from within the NameNode: * -delete The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow. Lower priority, some people do use operators, mostly to execute -or searches such as: * find / \(-nouser -or -nogroup\) Finally, I thought I'd include a link to the [Posix spec for find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9082) Select and document a platform-independent scripting language for use in Hadoop environment
[ https://issues.apache.org/jira/browse/HADOOP-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507163#comment-13507163 ] Allen Wittenauer commented on HADOOP-9082: -- (I know this is mostly going to get ignored because a) it's from me, b) it's more than 3 lines, and c) we've already proven that we only care about Linux despite people wanting support for other platforms, but here we go anyway.) While I can understand the build-time issues, I'm not sure I understand the run-time issues. If you are running on a system that doesn't have libhadoop or want to launch a task, you're going to hit a fork() and that's going to call bash (or potentially sh). Or are we planning on replacing taskjvm.sh as well? So the bash requirement doesn't go away. At run-time, the whole purpose of these scripts is to launch Java. That's it. The problem that we have is that our current scripts are extremely convoluted, wrap into themselves, and fundamentally aren't written very well. Arguing that we can make our launcher scripts object oriented or using an IDE to debug them seems like we're expecting to raise the complexity to even more ludicrous levels. One thing I'm very curious about is if we'll lose the ${BASH_SOURCE} functionality, something I considering absolutely critical, by moving to Python. (It allows one to run without setting *any* environment variables. I think I submitted that as a patch years ago, but well...) Let's say we pick Python. Which version are we going to target? From a support perspective, we could very easily end up asking about not only the Java version but the Python version. Do we really want that? bq. The alternative would be to maintain two complete suites of scripts, one for Linux and one for Windows (and perhaps others in the future). This is what most projects do that have Windows and UNIX functionality, from what I've seen. This is because things are in different locations, delimiters, etc, etc and if you merge them, you end up with a lot of if this then that, or if this2, then that2 to the point that you essentially have two different suites of scripts but just stored in one anyway. bq. We want to avoid the need to update dual modules in two different languages when functionality changes, especially given that many Linux developers are not familiar with powershell or bat, and many Windows developers are not familiar with shell or bash. I think this is the real message: the Linux developers.. which should be read as Java developers who work on Hadoop don't know bash and fundamentally ignore most attempts from outside to improve them. Switching to something else isn't going to change this problem. Instead, it'll just allow for them to continue ignoring the community in favor of their own changes. Perhaps the fundamental problem is this: Why are so many launcher changes even necessary? Why isn't Hadoop smart enough to figure out some of these things after Java is launched? Have we even seriously attempted a simplification of the scripts? (I suspect just using functions instead of the craziness around exported variables would make a world of difference.) Has there been any thought about actually creating real configuration files built by installers so we don't have to recompute a half-dozen things at every run time? Side-note: it would be interesting to see the memory footprint requirement differences on something like one of Yahoo!'s gateways. Sure, individually it isn't much. But at scale... Anyway, I've given my $0.02. Do what you want, I won't stop you. But I do question the thinking behind it. Select and document a platform-independent scripting language for use in Hadoop environment --- Key: HADOOP-9082 URL: https://issues.apache.org/jira/browse/HADOOP-9082 Project: Hadoop Common Issue Type: Bug Reporter: Matt Foley This issue is going to be discussed at length in the common-dev@ mailing list, under topic [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878819#comment-13878819 ] Allen Wittenauer commented on HADOOP-9902: -- At least for trunk (hadoop 3.x), yes, remove the deprecation checks. For any backport, the dep checks would need to stay (obviously). (I've noticed that people are adding sub-commands to the deprecation checks that were never in hadoop 1.x and earlier.) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Affects Version/s: 3.0.0 Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0, 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907228#comment-13907228 ] Allen Wittenauer commented on HADOOP-9902: -- Let's talk about the 'classpath' subcommand. Today, hadoop classpath returns the classpath of common, hdfs, and yarn. To me, this seems to be the wrong behavior for a two major reasons: * Common has to have to knowledge about subsystems that rely upon it. Ultimately, this is a reverse dependency and (I hope) we can all agree those are bad. * If I'm building an application that only needs access to (common|hdfs|yarn|mapreduce), my classpath is polluted with extra garbage from the other subproject(s) that may or may not need. (yarn does offer a classpath subcommand but it's essentially the same thing as the hadoop classpath. The mapreduce classpath is... yeah...) On the plus side, it's one stop shopping. Hooray! I get everything!, some developer likely said somewhere. So I'd like to throw out a proposal. I want to re-implement the classpath subcommand such that (hadoop|hdfs|yarn) only return the base classpath for their project. This is (obviously) an incompatible change. Someone who wanted to know what all the classpaths were for all the projects would be required to run all the commands. To make up for it, however, I believe I can *easily* introduce a classpath subcommand for *every* command that uses the common framework. For the non-major commands, I suspect this would be a massive win for debugging. What the heck is start-dfs.sh using when it fires up the namenode?, said myself many many times but using more curse words, some of which you might not have heard before. Another choice might be to have some tricky logic to have subprojects 'register' into the main project on install such that commands like 'hadoop classpath' now know about those subprojects. It won't solve the second bullet point, but it does fix the first. Thoughts? Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0, 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10322) Add ability to read principal names from a keytab
[ https://issues.apache.org/jira/browse/HADOOP-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907538#comment-13907538 ] Allen Wittenauer commented on HADOOP-10322: --- What happens when there are multiple principals in a keytab? Add ability to read principal names from a keytab - Key: HADOOP-10322 URL: https://issues.apache.org/jira/browse/HADOOP-10322 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.2.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, javadoc-warnings.txt It will be useful to have an ability to enumerate the principals stored in a keytab. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10322) Add ability to read principal names from a keytab
[ https://issues.apache.org/jira/browse/HADOOP-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920374#comment-13920374 ] Allen Wittenauer commented on HADOOP-10322: --- I have a feeling I didn't ask my question properly. If I have keytab with: {code} HTTP/hostname1@REALM hdfs/hostname1@REALM HTTP/hostname2@REALM hdfs/hostname2@REALM {code} inside of it, we'll do the correct thing? I'm just concerned about surprises callers may be faced with more complex (but legal!) setups. Add ability to read principal names from a keytab - Key: HADOOP-10322 URL: https://issues.apache.org/jira/browse/HADOOP-10322 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.2.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, HADOOP-10322.patch, javadoc-warnings.txt It will be useful to have an ability to enumerate the principals stored in a keytab. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004152#comment-14004152 ] Allen Wittenauer commented on HADOOP-9902: -- YARN-1429 has (unintentionally) added a lot of complexity with questionable benefits likely due to the parties involved not realizing that the functionality they seek is already there. In any case, with this change, we have this situation: {code} HADOOP_USER_CLASSPATH = CLASSPATH:HADOOP_USER_CLASSPATH YARN_USER_CLASSPATH = CLASSPATH:YARN_USER_CLASSPATH HADOOP_USER_CLASSPATH = CLASSPATH:HADOOP_USER_CLASSPATH:YARN_USER_CLASSPATH YARN_USER_CLASSSPATH HADOOP_USER_CLASSPATH_FIRST = HADOOP_USER_CLASSPATH:CLASSPATH:YARN_USER_CLASSPATH YARN_USER_CLASSPATH_FIRST = YARN_USER_CLASSPATH:CLASSPATH:HADOOP_USER_CLASSPATH HADOOP_USER_CLASSPATH_FIRST= YARN_USER_CLASSPATH:HADOOP_USER_CLASSPATH:CLASSPATH YARN_USER_CLASSPATH_FIRST {code} In the case of the other YARN_xxx dupes, the new code causes an override for YARN apps. In order to keep the consistency with the rest of the system, we should probably keep the same logic... essentially, YARN_USER_CLASSPATH will override HADOOP_USER_CLASSPATH entirely when running 'yarn xxx'. Ideally, we'd just back out YARN-1429 though and inform users of HADOOP_USER_CLASSPATH. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0, 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004967#comment-14004967 ] Allen Wittenauer commented on HADOOP-9902: -- {code} CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/ahs-config/log4j.properties ... CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/timelineserver-config/log4j.properties {code} The timeline server made more custom (and likely equally undocumented) log4j.properties locations. Needless to say, that's going away too just like their rm-config and nm-config brethren. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0, 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005165#comment-14005165 ] Allen Wittenauer commented on HADOOP-9902: -- Ran across an interesting discrepancy. hadoop-env.sh says: {code} # A string representing this instance of hadoop. $USER by default. export HADOOP_IDENT_STRING=$USER {code} This implies that could be something that isn't a user. However... {code} chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIR {code} ... we clearly have that assumption. Since the chown has already been removed from the new code, this problem goes away. But should we explicitly state that HADOOP_IDENT_STRING needs to be a user? Is anyone aware of anything else that uses this outside of the Hadoop shell scripts? Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0, 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005196#comment-14005196 ] Allen Wittenauer commented on HADOOP-9902: -- That's very helpful! (Especially since that was going to be the next place I looked since I just happen to have it cloned from git on my dev machine it's going to be one of the first big tests I do as I work towards a commit-able patch. :D ) Bigtop looks like it is doing what I would expect: setting it for Hadoop, but not using it directly. Which seems to indicate that, at least as far as Bigtop is concerned, we could expand the definition beyond it must be a user. Hadoop also uses HADOOP_IDENT_STR as the setting for the Java hadoop.id.str property. But I can't find a single place where this property is used. IIRC, it was used in ancient times for logging and/or display, but if we don't need the property set anymore because we've gotten wiser, I'd like to just yank that property completely. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0, 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006297#comment-14006297 ] Allen Wittenauer commented on HADOOP-9902: -- So, i found the only place where hadoop.id.str is still getting used (other than setting it): {code} ./bigtop-packages/src/common/hadoop/conf.secure/log4j.properties:log4j.appender.DRFAS.File=/var/local/hadoop/logs/${hadoop.id.str}/${hadoop.id.str}-auth.log {code} On the surface, this looks like a pretty good use case. So I suppose this property lives for another day. But I'm going to nuke yarn.id.str from the face of the earth since nothing references it. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0, 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Affects Version/s: (was: 2.1.1-beta) Status: Patch Available (was: Open) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: HADOOP-9902.patch Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Status: Open (was: Patch Available) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite
[ https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9704: - Status: Open (was: Patch Available) Write metrics sink plugin for Hadoop/Graphite - Key: HADOOP-9704 URL: https://issues.apache.org/jira/browse/HADOOP-9704 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Chu Tong Attachments: 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, HADOOP-9704.patch, HADOOP-9704.patch Write a metrics sink plugin for Hadoop to send metrics directly to Graphite in additional to the current ganglia and file ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: (was: HADOOP-9902.patch) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: HADOOP-9902.patch Here's the latest version of this patch against trunk (svn revision 1598750). There are still a few things I want to fix and I'm sure there are bugs floating around here and there. I'd appreciate any feedback with the patch thus far! Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: (was: HADOOP-9902.patch) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: HADOOP-9902.patch Let's try this again: git rev ca2d0153bf3ec2f7f228bb1e68c0cadf4fb2d6c5 svn rev 1598764 Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10389) Native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037714#comment-14037714 ] Allen Wittenauer commented on HADOOP-10389: --- bq. Boost in general has a versioning problem which is really bad for libraries. If your library links against version X of boost, any application linking to you needs to use the same version... otherwise bad stuff happens. FWIW, I agree wholeheartedly. Boost's major, blocker-level drawback is it's compatibility with itself. Enough of one, that I'm more than willing to -1 any patch that uses it. The operational issues of trying to support multiple versions of boost on a machine are too high of a wall to climb. Been there, done that, didn't even get a t-shirt. Native RPCv9 client --- Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe Attachments: HADOOP-10388.001.patch, HADOOP-10389-alternative.000.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch, HADOOP-10389.005.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10389) Native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038157#comment-14038157 ] Allen Wittenauer commented on HADOOP-10389: --- Not really, no, because it becomes a code maintenance nightmare whenever the API changes. Oh, we have to compile with this specific version of boost. ( Ironically, I was just told a story about someone who had this exact problem with Impala just today.) Historically, the Hadoop project has been trying to jettison those types of dependencies. (Hai forrest.) As to libuv, I'm still looking at Colin's patch. I have some concerns about libuv's stability as well (esp given http://upstream-tracker.org/versions/libuv.html) since lack experience with it... unlike boost. Native RPCv9 client --- Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe Attachments: HADOOP-10388.001.patch, HADOOP-10389-alternative.000.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch, HADOOP-10389.005.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-8989) hadoop dfs -find feature
[ https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042627#comment-14042627 ] Allen Wittenauer commented on HADOOP-8989: -- We should work on trying to avoid GNU or Linux -isms, but in this particular case it seems reasonable to default to pwd. hadoop dfs -find feature Key: HADOOP-8989 URL: https://issues.apache.org/jira/browse/HADOOP-8989 Project: Hadoop Common Issue Type: New Feature Reporter: Marco Nicosia Assignee: Jonathan Allen Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch Both sysadmins and users make frequent use of the unix 'find' command, but Hadoop has no correlate. Without this, users are writing scripts which make heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs -lsr is somewhat taxing on the NameNode, and a really slow experience on the client side. Possibly an in-NameNode find operation would be only a bit more taxing on the NameNode, but significantly faster from the client's point of view? The minimum set of options I can think of which would make a Hadoop find command generally useful is (in priority order): * -type (file or directory, for now) * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments) * -print0 (for piping to xargs -0) * -depth * -owner/-group (and -nouser/-nogroup) * -name (allowing for shell pattern, or even regex?) * -perm * -size One possible special case, but could possibly be really cool if it ran from within the NameNode: * -delete The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow. Lower priority, some people do use operators, mostly to execute -or searches such as: * find / \(-nouser -or -nogroup\) Finally, I thought I'd include a link to the [Posix spec for find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: HADOOP-9902-2.patch Should apply to git commit faf2d78012fd6fcf5fc433ab85b2dbc9d672c125 Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: HADOOP-9902-2.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Labels: releasenotes (was: ) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Labels: releasenotes (was: in releasenotes) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Labels: in releasenotes (was: releasenotes) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Status: Patch Available (was: Open) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050225#comment-14050225 ] Allen Wittenauer commented on HADOOP-9902: -- Those failures don't look related to this patch. My to do list: * Write some release notes (incompat changes, new features, bug fixes) * Some relatively minor code cleanup * Add some better comments on hadoop-functions.sh, including expected input/output for users who replace those functions * Add some developer notes (how do i add a subcommand? how do i add a new command?), probably on the wiki * More security testing * Create some related JIRAs (unit testing, rewrite the scripts that I'm skipping this pass) FWIW, I'm still leaning towards breaking 'hadoop classpath'. I've had a few discussions offline and many people seem to think that breaking it is actually a good idea. That said... it'd be good to have more folks test this in the Real World. I'd like to get this committed soon-ish. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Release Note: The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some new features. While an eye has been kept towards compatibility, some changes may break existing installations. INCOMPATIBLE CHANGES: * The pid files for secure daemons have been renamed to include the appropriate $HADOOP_IDENT_STR. This should allow, with proper configurations in place, for multiple versions of the same secure daemon to run on a host. * All YARN_* and MAPRED_* environment variables act as overrides to their equivalent HADOOP_* environment variables when 'yarn', 'mapred' and related commands are executed. Previously, these were separated which meant duplication of common settings. * All Hadoop shell script subsystems execute hadoop-env.sh, which allows for all of the environment variables to be in one location. This was not the case previously. * hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated in two different locations. The sbin version has been removed. * The log4j settings forcibly set by some *-daemon.sh commands has been removed. This is now configurable in the *-env.sh files. Users who do not have these set will see logs going in odd places. * Support for various undocumentented YARN log4j.properties files has been removed. * Support for $HADOOP_MASTER and the rsync code has been removed. * yarn.id.str has been removed. * We now require bash v3 or better. BUG FIXES: * HADOOP_CONF_DIR is now properly honored. * Documented hadoop-layout.sh * Shell commands should now work properly when called as a relative path. * Operations which trigger ssh will now limit how many connections run in parallel to 10 to prevent memory and network exhaustion. * HADOOP_CLIENT_OPTS support has been added to a few more commands. * Various options on hadoop comamnd lines were supported inconsistently. These have been unified into hadoop-config.sh. --config still needs to come first, however. * ulimit logging for secure daemons no longer assumes /bin/bash but does assume bash on the command line path. * Removed references to some Yahoo! specific paths. IMPROVEMENTS: * Significant amounts of redundant code has been moved into a new file called hadoop-functions.sh. * Improved information in *-env.sh on what can be set, ramifications of setting, etc. * There is an attempt to do some trivial deduplication of the classpath and JVM options. This allows, amongst other things, for custom settings in *_OPTS for Hadoop daemons to override defaults and other generic settings (i.e., $HADOOP_OPTS). This is particularly relevant for Xmx settings, as one can now set them in _OPTS and ignore the heap specific options for daemons which force the size in megabytes. * Operations which trigger ssh connections can now use pdsh if installed. $HADOOP_SSH_OPTS still gets applied. * Subcommands have been alphabetized in both usage and in the code. * All/most of the functionality provided by the sbin/* commands has been moved to either their bin/ equivalents or made into functions. The rewritten versions of these commands are now wrappers to maintain backward compatibility. Of particular note is the new --daemon option present in some bin/ commands which allow certain subcommands to be daemonized. * It is now possible to override some of the shell code capabilities to provide site specific functionality. * A new option called --buildpaths will attempt to add developer build directories to the classpath to allow for in source tree testing. * If a usage function is defined, -h, -help, and --help will all trigger a help message. * Several generic environment variables have been added to provide a common configuration for pids, logs, and their security equivalents. The older versions still act as overrides to these generic versions. * Groundwork has been laid to allow for custom secure daemon setup using something other than jsvc. Hadoop Flags: Incompatible change Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Release Note: The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some new features. While an eye has been kept towards compatibility, some changes may break existing installations. INCOMPATIBLE CHANGES: * The pid files for secure daemons have been renamed to include the appropriate $HADOOP_IDENT_STR. This should allow, with proper configurations in place, for multiple versions of the same secure daemon to run on a host. * All YARN_* and MAPRED_* environment variables act as overrides to their equivalent HADOOP_* environment variables when 'yarn', 'mapred' and related commands are executed. Previously, these were separated which meant duplication of common settings. * All Hadoop shell script subsystems execute hadoop-env.sh, which allows for all of the environment variables to be in one location. This was not the case previously. * hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated in two different locations. The sbin version has been removed. * The log4j settings forcibly set by some *-daemon.sh commands has been removed. This is now configurable in the *-env.sh files. Users who do not have these set will see logs going in odd places. * Support for various undocumentented YARN log4j.properties files has been removed. * Support for $HADOOP_MASTER and the rsync code have been removed. * yarn.id.str has been removed. * We now require bash v3 or better. BUG FIXES: * HADOOP_CONF_DIR is now properly honored. * Documented hadoop-layout.sh. * Shell commands should now work properly when called as a relative path. * Operations which trigger ssh will now limit how many connections run in parallel to 10 to prevent memory and network exhaustion. * HADOOP_CLIENT_OPTS support has been added to a few more commands. * Various options on hadoop command lines were supported inconsistently. These have been unified into hadoop-config.sh. --config still needs to come first, however. * ulimit logging for secure daemons no longer assumes /bin/bash but does assume bash on the command line path. * Removed references to some Yahoo! specific paths. IMPROVEMENTS: * Significant amounts of redundant code have been moved into a new file called hadoop-functions.sh. * Improved information in *-env.sh on what can be set, ramifications of setting, etc. * There is an attempt to do some trivial deduplication of the classpath and JVM options. This allows, amongst other things, for custom settings in *_OPTS for Hadoop daemons to override defaults and other generic settings (i.e., $HADOOP_OPTS). This is particularly relevant for Xmx settings, as one can now set them in _OPTS and ignore the heap specific options for daemons which force the size in megabytes. * Operations which trigger ssh connections can now use pdsh if installed. $HADOOP_SSH_OPTS still gets applied. * Subcommands have been alphabetized in both usage and in the code. * All/most of the functionality provided by the sbin/* commands has been moved to either their bin/ equivalents or made into functions. The rewritten versions of these commands are now wrappers to maintain backward compatibility. Of particular note is the new --daemon option present in some bin/ commands which allow certain subcommands to be daemonized. * It is now possible to override some of the shell code capabilities to provide site specific functionality. * A new option called --buildpaths will attempt to add developer build directories to the classpath to allow for in source tree testing. * If a usage function is defined, -h, -help, and --help will all trigger a help message. * Several generic environment variables have been added to provide a common configuration for pids, logs, and their security equivalents. The older versions still act as overrides to these generic versions. * Groundwork has been laid to allow for custom secure daemon setup using something other than jsvc. was: The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some new features. While an eye has been kept towards compatibility, some changes may break existing installations. INCOMPATIBLE CHANGES: * The pid files for secure daemons have been renamed to include the appropriate $HADOOP_IDENT_STR. This should allow, with proper configurations in place, for multiple versions of the same secure daemon to run on a host. * All YARN_* and MAPRED_* environment variables act as overrides to their equivalent HADOOP_* environment variables when 'yarn', 'mapred' and related commands are executed. Previously, these were separated which meant duplication of common settings. * All Hadoop shell script subsystems execute hadoop-env.sh, which allows for all of the environment variables to be
[jira] [Created] (HADOOP-10787) Rename DEFAULT_LIBEXEC_DIR from the shell scripts
Allen Wittenauer created HADOOP-10787: - Summary: Rename DEFAULT_LIBEXEC_DIR from the shell scripts Key: HADOOP-10787 URL: https://issues.apache.org/jira/browse/HADOOP-10787 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer DEFAULT_LIBEXEC_DIR pollutes the shell name space. It should be renamed to HADOOP_DEFAULT_LIBEXEC_DIR. Unfortunately, this touches every single shell script. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10787) Rename DEFAULT_LIBEXEC_DIR from the shell scripts
[ https://issues.apache.org/jira/browse/HADOOP-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053137#comment-14053137 ] Allen Wittenauer commented on HADOOP-10787: --- This should be done after HADOOP-9902. Rename DEFAULT_LIBEXEC_DIR from the shell scripts - Key: HADOOP-10787 URL: https://issues.apache.org/jira/browse/HADOOP-10787 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer DEFAULT_LIBEXEC_DIR pollutes the shell name space. It should be renamed to HADOOP_DEFAULT_LIBEXEC_DIR. Unfortunately, this touches every single shell script. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10788) Rewrite httpfs, kms, sls, and other stragglers from HADOOP-9902
Allen Wittenauer created HADOOP-10788: - Summary: Rewrite httpfs, kms, sls, and other stragglers from HADOOP-9902 Key: HADOOP-10788 URL: https://issues.apache.org/jira/browse/HADOOP-10788 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer There are some stragglers not targeted by HADOOP-9902. These should also get rewritten to use the new hadoop-functions.sh framework. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Release Note: The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some new features. While an eye has been kept towards compatibility, some changes may break existing installations. INCOMPATIBLE CHANGES: * The pid files for secure daemons have been renamed to include the appropriate $HADOOP_IDENT_STR. This should allow, with proper configurations in place, for multiple versions of the same secure daemon to run on a host. * All YARN_* and MAPRED_* environment variables act as overrides to their equivalent HADOOP_* environment variables when 'yarn', 'mapred' and related commands are executed. Previously, these were separated which meant duplication of common settings. * All Hadoop shell script subsystems execute hadoop-env.sh, which allows for all of the environment variables to be in one location. This was not the case previously. * hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated placed into libexec and sbin during install. The sbin version has been removed. * The log4j settings forcibly set by some *-daemon.sh commands have been removed. This is now configurable in the *-env.sh files. Users who do not have these set will see logs going in odd places. * Support for various undocumentented YARN log4j.properties files has been removed. * Support for $HADOOP_MASTER and the related rsync code have been removed. * yarn.id.str has been removed. * We now require bash v3 (released July 27, 2004) or better in order to take advantage of better regex handling. * Support for --script has been removed from the sbin commands. We now use $HADOOP_*_PATH or $HADOOP_PREFIX to find the necessary binaries. BUG FIXES: * HADOOP_CONF_DIR is now properly honored everywhere. * Documented hadoop-layout.sh. * Added better comments to *-env.sh. * Shell commands should now work properly when called as a relative path and without HADOOP_PREFIX. If HADOOP_PREFIX is not set, it will be automatically determined based upon the current location of the shell command. Note that other parts of the ecosystem may require this environment variable to be configured. * Operations which trigger ssh will now limit how many connections run in parallel to 10 to prevent memory and network exhaustion. * HADOOP_CLIENT_OPTS support has been added to a few more commands. * Various options on hadoop command lines were supported inconsistently. These have been unified into hadoop-config.sh. --config still needs to come first, however. * ulimit logging for secure daemons no longer assumes /bin/bash but does assume bash on the command line path. * Removed references to some Yahoo! specific paths. * Removed unused slaves.sh from YARN build tree. IMPROVEMENTS: * Significant amounts of redundant code have been moved into a new file called hadoop-functions.sh. * Improved information in *-env.sh on what can be set, ramifications of setting, etc. * There is an attempt to do some trivial deduplication of the classpath and JVM options. This allows, amongst other things, for custom settings in *_OPTS for Hadoop daemons to override defaults and other generic settings (i.e., $HADOOP_OPTS). This is particularly relevant for Xmx settings, as one can now set them in _OPTS and ignore the heap specific options for daemons which force the size in megabytes. * Operations which trigger ssh connections can now use pdsh if installed. $HADOOP_SSH_OPTS still gets applied. * Subcommands have been alphabetized in both usage and in the code. * All/most of the functionality provided by the sbin/* commands has been moved to either their bin/ equivalents or made into functions. The rewritten versions of these commands are now wrappers to maintain backward compatibility. Of particular note is the new --daemon option present in some bin/ commands which allow certain subcommands to be daemonized. * It is now possible to override some of the shell code capabilities to provide site specific functionality. * A new option called --buildpaths will attempt to add developer build directories to the classpath to allow for in source tree testing. * If a usage function is defined, the following will trigger a help message if it is given in the option path to the shell script: --? -? ? --help -help -h help * Several generic environment variables have been added to provide a common configuration for pids, logs, and their security equivalents. The older versions still act as overrides to these generic versions. * Groundwork has been laid to allow for custom secure daemon setup using something other than jsvc. * Added distch subcommand to hadoop command. was: The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some new features. While an eye
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Status: Open (was: Patch Available) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Target Version/s: 3.0.0 Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: HADOOP-9902-3.patch Great feedback! Here's a new patch that incorporates that change as well as fixing a ton of bugs (esp around secure daemons), has some code cleanup, env var namespace cleanup, and one or two minor new features. (See the release notes). I think I'm at the point where I need other people to start banging on this patch to test it out. The sooner we get it into trunk, the better. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Status: Patch Available (was: Open) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10807) GenericOptionsParser needs updating for Hadoop 2.x+
Allen Wittenauer created HADOOP-10807: - Summary: GenericOptionsParser needs updating for Hadoop 2.x+ Key: HADOOP-10807 URL: https://issues.apache.org/jira/browse/HADOOP-10807 Project: Hadoop Common Issue Type: Bug Components: util Reporter: Allen Wittenauer Priority: Minor The options presented to users, the comments, etc, are all woefully out of date and don't reflect the current reality. These should be updated for Hadoop 2.x and up. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10807) GenericOptionsParser needs updating for Hadoop 2.x+
[ https://issues.apache.org/jira/browse/HADOOP-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056735#comment-14056735 ] Allen Wittenauer commented on HADOOP-10807: --- Daddy, what's a jobtracker? Well, you see, back when telephones were the sizes of bricks... GenericOptionsParser needs updating for Hadoop 2.x+ --- Key: HADOOP-10807 URL: https://issues.apache.org/jira/browse/HADOOP-10807 Project: Hadoop Common Issue Type: Bug Components: util Reporter: Allen Wittenauer Priority: Minor Labels: newbie The options presented to users, the comments, etc, are all woefully out of date and don't reflect the current reality. These should be updated for Hadoop 2.x and up. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: HADOOP-9902-4.patch Updated patch for git rev 8d5e8c860ed361ed792affcfe06f1a34b017e421. This includes many edge case bug fixes, a much more consistent coding style, the requested addition of the hadoop jnipath command, and a run through shellcheck. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902-4.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Status: Patch Available (was: Open) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902-4.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Status: Open (was: Patch Available) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902-4.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Status: Open (was: Patch Available) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902-4.patch, HADOOP-9902-5.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: HADOOP-9902-5.patch Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902-4.patch, HADOOP-9902-5.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Release Note: The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some new features. While an eye has been kept towards compatibility, some changes may break existing installations. INCOMPATIBLE CHANGES: * The pid, out, etc files for secure daemons have been renamed to include the appropriate ${HADOOP_IDENT_STR}. This should allow, with proper configurations in place, for multiple versions of the same secure daemon to run on a host. * All Hadoop shell script subsystems execute hadoop-env.sh, which allows for all of the environment variables to be in one location. This was not the case previously. * The default content of *-env.sh has been significantly alterated, with the majority of defaults moved into more protected areas. * All YARN_* and MAPRED_* environment variables act as overrides to their equivalent HADOOP_* environment variables when 'yarn', 'mapred' and related commands are executed. Previously, these were separated out which meant a significant amount of duplication of common settings. * hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec and sbin. The sbin versions have been removed. * The log4j settings forcibly set by some *-daemon.sh commands have been removed. These settings are now configurable in the *-env.sh files, in particular via *_OPT. * Support for various undocumentented YARN log4j.properties files has been removed. * Support for $HADOOP_MASTER and the related rsync code have been removed. * yarn.id.str has been removed. * We now require bash v3 (released July 27, 2004) or better in order to take advantage of better regex handling and ${BASH_SOURCE}. POSIX sh will not work. * Support for --script has been removed. We now use ${HADOOP_*_PATH} or ${HADOOP_PREFIX} to find the necessary binaries. (See other note regarding ${HADOOP_PREFIX} auto discovery.) * Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be ignored and stripped from their respective environment settings. BUG FIXES: * ${HADOOP_CONF_DIR} is now properly honored everywhere. * Documented hadoop-layout.sh with a provided hadoop-layout.sh.example file. * Shell commands should now work properly when called as a relative path and without HADOOP_PREFIX being defined. If ${HADOOP_PREFIX} is not set, it will be automatically determined based upon the current location of the shell library. Note that other parts of the ecosystem may require this environment variable to be configured. * Operations which trigger ssh will now limit the number of connections to run in parallel to ${HADOOP_SSH_PARALLEL} to prevent memory and network exhaustion. By default, this is set to 10. * ${HADOOP_CLIENT_OPTS} support has been added to a few more commands. * Various options on hadoop command lines were supported inconsistently. These have been unified into hadoop-config.sh. --config still needs to come first, however. * ulimit logging for secure daemons no longer assumes /bin/bash but does assume bash on the command line path. * Removed references to some Yahoo! specific paths. * Removed unused slaves.sh from YARN build tree. IMPROVEMENTS: * Significant amounts of redundant code have been moved into a new file called hadoop-functions.sh. * Improved information in the default *-env.sh on what can be set, ramifications of setting, etc. * There is an attempt to do some trivial deduplication and sanitization of the classpath and JVM options. This allows, amongst other things, for custom settings in *_OPTS for Hadoop daemons to override defaults and other generic settings (i.e., $HADOOP_OPTS). This is particularly relevant for Xmx settings, as one can now set them in _OPTS and ignore the heap specific options for daemons which force the size in megabytes. * Operations which trigger ssh connections can now use pdsh if installed. $HADOOP_SSH_OPTS still gets applied. * Subcommands have been alphabetized in both usage and in the code. * All/most of the functionality provided by the sbin/* commands has been moved to either their bin/ equivalents or made into functions. The rewritten versions of these commands are now wrappers to maintain backward compatibility. Of particular note is the new --daemon option present in some bin/ commands which allow certain subcommands to be daemonized. * It is now possible to override some of the shell code capabilities to provide site specific functionality. * A new option called --buildpaths will attempt to add developer build directories to the classpath to allow for in source tree testing. * If a usage function is defined, the following will trigger a help message if it is given in the option path to the shell script: --? -? ? --help -help -h help *
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061399#comment-14061399 ] Allen Wittenauer commented on HADOOP-9902: -- This same patch applies to branch-2, if someone wants to play on a relatively closer to live system. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902-4.patch, HADOOP-9902-5.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062380#comment-14062380 ] Allen Wittenauer commented on HADOOP-9902: -- Added https://wiki.apache.org/hadoop/ShellScriptProgrammingGuide . Also, a status update: I have some output from kill to send to /dev/null (triggers extraneous output when the daemon is already down) and some updates on the comments left to do. Barring any additional feedback, my goals for this patch have been met and I mostly consider it finished. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902-4.patch, HADOOP-9902-5.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: HADOOP-9902-6.patch vs -5, this patch fixes a kill output not sent to /dev/null, updates some of the hadoop-env.sh commentary, and fixes a code style issue in rotate_logs. If there is interest, I can make a patch for 2.4.1. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: releasenotes Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902-4.patch, HADOOP-9902-5.patch, HADOOP-9902-6.patch, HADOOP-9902.patch, HADOOP-9902.txt, hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10641) Introduce Coordination Engine
[ https://issues.apache.org/jira/browse/HADOOP-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064987#comment-14064987 ] Allen Wittenauer commented on HADOOP-10641: --- Did you mean ConsensusNameNode? Introduce Coordination Engine - Key: HADOOP-10641 URL: https://issues.apache.org/jira/browse/HADOOP-10641 Project: Hadoop Common Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Attachments: HADOOP-10641.patch, HADOOP-10641.patch, HADOOP-10641.patch, hadoop-coordination.patch Coordination Engine (CE) is a system, which allows to agree on a sequence of events in a distributed system. In order to be reliable CE should be distributed by itself. Coordination Engine can be based on different algorithms (paxos, raft, 2PC, zab) and have different implementations, depending on use cases, reliability, availability, and performance requirements. CE should have a common API, so that it could serve as a pluggable component in different projects. The immediate beneficiaries are HDFS (HDFS-6469) and HBase (HBASE-10909). First implementation is proposed to be based on ZooKeeper. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-8100) share web server information for http filters
[ https://issues.apache.org/jira/browse/HADOOP-8100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-8100. -- Resolution: Won't Fix share web server information for http filters - Key: HADOOP-8100 URL: https://issues.apache.org/jira/browse/HADOOP-8100 Project: Hadoop Common Issue Type: New Feature Affects Versions: 1.0.0, 0.23.2, 0.24.0 Reporter: Allen Wittenauer Attachments: HADOOP-8100-branch-1.0.patch This is a simple fix which shares the web server bind information for consumption down stream for 3rd party plugins. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-8100) share web server information for http filters
[ https://issues.apache.org/jira/browse/HADOOP-8100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-8100: - Status: Open (was: Patch Available) share web server information for http filters - Key: HADOOP-8100 URL: https://issues.apache.org/jira/browse/HADOOP-8100 Project: Hadoop Common Issue Type: New Feature Affects Versions: 1.0.0, 0.23.2, 0.24.0 Reporter: Allen Wittenauer Attachments: HADOOP-8100-branch-1.0.patch This is a simple fix which shares the web server bind information for consumption down stream for 3rd party plugins. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-8026) various shell script fixes
[ https://issues.apache.org/jira/browse/HADOOP-8026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-8026. -- Resolution: Duplicate This is all part of HADOOP-9902 now. various shell script fixes -- Key: HADOOP-8026 URL: https://issues.apache.org/jira/browse/HADOOP-8026 Project: Hadoop Common Issue Type: Bug Affects Versions: 1.0.0 Reporter: Allen Wittenauer Attachments: HADOOP-8026-branch-1.0.txt Various shell script fixes: * repair naked $0s so that dir detections work * remove superfluous JAVA_HOME settings * use /usr/bin/pdsh in slaves.sh if it exists -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-8025) change default distcp log location to be /tmp rather than cwd
[ https://issues.apache.org/jira/browse/HADOOP-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-8025. -- Resolution: Won't Fix change default distcp log location to be /tmp rather than cwd - Key: HADOOP-8025 URL: https://issues.apache.org/jira/browse/HADOOP-8025 Project: Hadoop Common Issue Type: Improvement Affects Versions: 1.0.0 Reporter: Allen Wittenauer Priority: Trivial Attachments: HADOOP-8025-branch-1.0.txt distcp loves to leave emtpy files around. this puts them in /tmp so at least they are easy to find and kill. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-535) back to back testing of codecs
[ https://issues.apache.org/jira/browse/HADOOP-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065054#comment-14065054 ] Allen Wittenauer commented on HADOOP-535: - This was done years ago, wasn't it? back to back testing of codecs -- Key: HADOOP-535 URL: https://issues.apache.org/jira/browse/HADOOP-535 Project: Hadoop Common Issue Type: Test Components: io Reporter: Owen O'Malley Assignee: Arun C Murthy We should write some unit tests that use codecs back to back doing writing and then reading. compressed block1, compressed block 2, compressed block3, ... that will check that the compression codecs are consuming the entire block when they read. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10854) unit tests for the shell scripts
Allen Wittenauer created HADOOP-10854: - Summary: unit tests for the shell scripts Key: HADOOP-10854 URL: https://issues.apache.org/jira/browse/HADOOP-10854 Project: Hadoop Common Issue Type: Test Reporter: Allen Wittenauer With HADOOP-9902 moving a lot of functionality to functions, we should build some unit tests for them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-1024) Add stable version line to the website front page
[ https://issues.apache.org/jira/browse/HADOOP-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-1024. -- Resolution: Fixed This was done forever ago. Add stable version line to the website front page - Key: HADOOP-1024 URL: https://issues.apache.org/jira/browse/HADOOP-1024 Project: Hadoop Common Issue Type: Improvement Reporter: Owen O'Malley I think it would be worthwhile to add two lines to the top of the welcome website page: Stable version: 0.10.1 Latest version: 0.11.1 With the number linking off to the respective release like so: http://www.apache.org/dyn/closer.cgi/lucene/hadoop/hadoop-0.10.1.tar.gz We can promote versions from Latest to Stable when they have proven themselves. Thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-1464) IPC server should not log thread stacks at the info level
[ https://issues.apache.org/jira/browse/HADOOP-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-1464. -- Resolution: Fixed I'm going to close this out as stale. I suspect this is no longer an issue. IPC server should not log thread stacks at the info level - Key: HADOOP-1464 URL: https://issues.apache.org/jira/browse/HADOOP-1464 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 0.12.3 Reporter: Hairong Kuang Currently when IPC server get a call which becomes too old, i.e. the call has not been served for too long time, it dumps all thread stacks to logs at the info level. Because the size of all thread stacks size might be very big, it would be better to log them at the debug level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-1496) Test coverage target in build files using emma
[ https://issues.apache.org/jira/browse/HADOOP-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-1496. -- Resolution: Won't Fix I'm going to close this now with won't fix given the clover coverage. Test coverage target in build files using emma -- Key: HADOOP-1496 URL: https://issues.apache.org/jira/browse/HADOOP-1496 Project: Hadoop Common Issue Type: Improvement Components: build Environment: all Reporter: woyg Priority: Minor Attachments: emma.tgz, hadoop_clover.patch, patch.emma.txt, patch.emma.txt.2 Test coverage targets for Hadoop using emma. Test coverage will help in identifying the components which are not poperly covered in tests and write test cases for it. Emma (http://emma.sourceforge.net/) is a good tool for coverage. If you have something else in mind u can suggest. I have a patch ready with emma. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-1688) TestCrcCorruption hangs on windows
[ https://issues.apache.org/jira/browse/HADOOP-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-1688. -- Resolution: Fixed Closing this as stale. TestCrcCorruption hangs on windows -- Key: HADOOP-1688 URL: https://issues.apache.org/jira/browse/HADOOP-1688 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 0.14.0 Environment: Windows Reporter: Konstantin Shvachko TestCrcCorruption times out on windows saying just that it timed out. No other useful information in the log. Some kind of timing issue, because if I run it with output=yes then it succeeds. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-1754) A testimonial page for hadoop?
[ https://issues.apache.org/jira/browse/HADOOP-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-1754. -- Resolution: Fixed We have entire conferences now. Closing. A testimonial page for hadoop? -- Key: HADOOP-1754 URL: https://issues.apache.org/jira/browse/HADOOP-1754 Project: Hadoop Common Issue Type: Wish Components: documentation Reporter: Konstantin Shvachko Priority: Minor Should we create a testimonial page on hadoop wiki with a link from Hadoop home page so that people could share their experience of using Hadoop? I see some satisfied users out there. :) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-1791) Cleanup local files command(s)
[ https://issues.apache.org/jira/browse/HADOOP-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065206#comment-14065206 ] Allen Wittenauer commented on HADOOP-1791: -- I'm not sure about -format as the option, but this would be kind of nice to have. Cleanup local files command(s) -- Key: HADOOP-1791 URL: https://issues.apache.org/jira/browse/HADOOP-1791 Project: Hadoop Common Issue Type: New Feature Components: util Affects Versions: 0.15.0 Reporter: Enis Soztutar Labels: newbie It would be good if we had some clean up command to cleanup all the local directories that any component of hadoop uses. That way before the cluster is restarted again, or when a machine is decided to be pulled out of the cluster, we can cleanup all the local files. i propose we add {noformat} bin/hadoop datanode -format bin/hadoop tasktracker -format bin/hadoop jobtracker -format {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)