[jira] [Commented] (HADOOP-11296) hadoop-daemons.sh throws 'host1: bash: host3: command not found...'
[ https://issues.apache.org/jira/browse/HADOOP-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208697#comment-14208697 ] Allen Wittenauer commented on HADOOP-11296: --- It looks like this is dependent upon either the version of bash or the version of xargs, as I'm having trouble reproducing this on OS X. Even the test code gives me the expected output. What version is showing the problem? hadoop-daemons.sh throws 'host1: bash: host3: command not found...' --- Key: HADOOP-11296 URL: https://issues.apache.org/jira/browse/HADOOP-11296 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HADOOP-11296-001.patch, HADOOP-11296-002.patch *hadoop-daemons.sh* throws command not found. {noformat}[vinay@host2 install]$ /home/vinay/install/hadoop/sbin/hadoop-[vinay@host2 install]$ /home/vinay/install/hadoop/sbin/hadoop-daemons.sh --config /home/vinay/install/conf --hostnames 'host1 host2' start namenode host1: bash: host2: command not found... {noformat} *hadoop-daemons.sh* is mainly used to start the cluster, for ex: start-dfs.sh Without this cluster will not be able to start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11298) slaves.sh is missing a /
[ https://issues.apache.org/jira/browse/HADOOP-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11298: -- Component/s: scripts slaves.sh is missing a / - Key: HADOOP-11298 URL: https://issues.apache.org/jira/browse/HADOOP-11298 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Priority: Trivial Labels: newbie, shell Just need to turn dev/null - /dev/null in the cd statement in the preamble. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11298) slaves.sh is missing a /
[ https://issues.apache.org/jira/browse/HADOOP-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11298: -- Affects Version/s: 3.0.0 slaves.sh is missing a / - Key: HADOOP-11298 URL: https://issues.apache.org/jira/browse/HADOOP-11298 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Priority: Trivial Labels: newbie, shell Just need to turn dev/null - /dev/null in the cd statement in the preamble. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11298) slaves.sh is missing a /
Allen Wittenauer created HADOOP-11298: - Summary: slaves.sh is missing a / Key: HADOOP-11298 URL: https://issues.apache.org/jira/browse/HADOOP-11298 Project: Hadoop Common Issue Type: Bug Reporter: Allen Wittenauer Priority: Trivial Just need to turn dev/null - /dev/null in the cd statement in the preamble. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11296) hadoop-daemons.sh throws 'host1: bash: host3: command not found...'
[ https://issues.apache.org/jira/browse/HADOOP-11296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208735#comment-14208735 ] Allen Wittenauer commented on HADOOP-11296: --- OK, it looks like OS X is the outlier. I've been able to reproduce this on both Linux and Illumos. As part of that, it looks like Illumos xargs doesn't support -P parameter (SVID issue? non-POSIX extension?). At first glance, the patch seems reasonable, but I want to test a few things out. :) hadoop-daemons.sh throws 'host1: bash: host3: command not found...' --- Key: HADOOP-11296 URL: https://issues.apache.org/jira/browse/HADOOP-11296 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HADOOP-11296-001.patch, HADOOP-11296-002.patch *hadoop-daemons.sh* throws command not found. {noformat}[vinay@host2 install]$ /home/vinay/install/hadoop/sbin/hadoop-[vinay@host2 install]$ /home/vinay/install/hadoop/sbin/hadoop-daemons.sh --config /home/vinay/install/conf --hostnames 'host1 host2' start namenode host1: bash: host2: command not found... {noformat} *hadoop-daemons.sh* is mainly used to start the cluster, for ex: start-dfs.sh Without this cluster will not be able to start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11257) Update hadoop jar documentation to warn against using it for launching yarn jars
[ https://issues.apache.org/jira/browse/HADOOP-11257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208891#comment-14208891 ] Allen Wittenauer commented on HADOOP-11257: --- While this is nice and all, I'm not really sure if it ultimately fixes anything. We still have two code paths to test. Update hadoop jar documentation to warn against using it for launching yarn jars -- Key: HADOOP-11257 URL: https://issues.apache.org/jira/browse/HADOOP-11257 Project: Hadoop Common Issue Type: Improvement Reporter: Allen Wittenauer Assignee: Masatake Iwasaki Attachments: HADOOP-11257.1.patch, HADOOP-11257.1.patch, HADOOP-11257.2.patch, HADOOP-11257.3.patch We should update the hadoop jar documentation to warn against using it for launching yarn jars. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11025) hadoop-daemons.sh should just call hdfs directly
[ https://issues.apache.org/jira/browse/HADOOP-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208910#comment-14208910 ] Allen Wittenauer commented on HADOOP-11025: --- +1 will commit to trunk. Thanks!! hadoop-daemons.sh should just call hdfs directly Key: HADOOP-11025 URL: https://issues.apache.org/jira/browse/HADOOP-11025 Project: Hadoop Common Issue Type: Improvement Components: scripts Reporter: Allen Wittenauer Assignee: Masatake Iwasaki Attachments: HADOOP-11025.1.patch, HADOOP-11025.2.patch There is little-to-no reason for it to call hadoop-daemon.sh anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11025) hadoop-daemons.sh should just call hdfs directly
[ https://issues.apache.org/jira/browse/HADOOP-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-11025. --- Resolution: Fixed Fix Version/s: 3.0.0 Committed to trunk. hadoop-daemons.sh should just call hdfs directly Key: HADOOP-11025 URL: https://issues.apache.org/jira/browse/HADOOP-11025 Project: Hadoop Common Issue Type: Improvement Components: scripts Reporter: Allen Wittenauer Assignee: Masatake Iwasaki Fix For: 3.0.0 Attachments: HADOOP-11025.1.patch, HADOOP-11025.2.patch There is little-to-no reason for it to call hadoop-daemon.sh anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11284) Fix variable name mismatch in hadoop-functions.sh
[ https://issues.apache.org/jira/browse/HADOOP-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208937#comment-14208937 ] Allen Wittenauer commented on HADOOP-11284: --- Ugh. I really messed these up. Thanks for finding them. :) +1 will commit to trunk. Fix variable name mismatch in hadoop-functions.sh - Key: HADOOP-11284 URL: https://issues.apache.org/jira/browse/HADOOP-11284 Project: Hadoop Common Issue Type: Bug Components: scripts Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Attachments: HADOOP-11284.1.patch Some functions use variables not given as argument but defined in outside of function. The variables are used as name of pid file. Though hadoop-functions.sh works by chance now, it should be fixed to avoid future bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11284) Fix variable name mismatches in hadoop-functions.sh
[ https://issues.apache.org/jira/browse/HADOOP-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11284: -- Summary: Fix variable name mismatches in hadoop-functions.sh (was: Fix variable name mismatch in hadoop-functions.sh) Fix variable name mismatches in hadoop-functions.sh --- Key: HADOOP-11284 URL: https://issues.apache.org/jira/browse/HADOOP-11284 Project: Hadoop Common Issue Type: Bug Components: scripts Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Attachments: HADOOP-11284.1.patch Some functions use variables not given as argument but defined in outside of function. The variables are used as name of pid file. Though hadoop-functions.sh works by chance now, it should be fixed to avoid future bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11284) Fix variable name mismatches in hadoop-functions.sh
[ https://issues.apache.org/jira/browse/HADOOP-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-11284. --- Resolution: Fixed Fix Version/s: 3.0.0 Committed to trunk. Thanks! Fix variable name mismatches in hadoop-functions.sh --- Key: HADOOP-11284 URL: https://issues.apache.org/jira/browse/HADOOP-11284 Project: Hadoop Common Issue Type: Bug Components: scripts Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Fix For: 3.0.0 Attachments: HADOOP-11284.1.patch Some functions use variables not given as argument but defined in outside of function. The variables are used as name of pid file. Though hadoop-functions.sh works by chance now, it should be fixed to avoid future bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11298) slaves.sh and stop-all.sh are missing slashes
[ https://issues.apache.org/jira/browse/HADOOP-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11298: -- Summary: slaves.sh and stop-all.sh are missing slashes (was: slaves.sh is missing a / ) slaves.sh and stop-all.sh are missing slashes -- Key: HADOOP-11298 URL: https://issues.apache.org/jira/browse/HADOOP-11298 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Priority: Trivial Labels: newbie, shell Just need to turn dev/null - /dev/null in the cd statement in the preamble. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11298) slaves.sh and stop-all.sh are missing slashes
[ https://issues.apache.org/jira/browse/HADOOP-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11298: -- Status: Patch Available (was: Open) slaves.sh and stop-all.sh are missing slashes -- Key: HADOOP-11298 URL: https://issues.apache.org/jira/browse/HADOOP-11298 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Priority: Trivial Labels: newbie, shell Attachments: HADOOP-11298.patch Just need to turn dev/null - /dev/null in the cd statement in the preamble. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11298) slaves.sh and stop-all.sh are missing slashes
[ https://issues.apache.org/jira/browse/HADOOP-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11298: -- Attachment: HADOOP-11298.patch Effectively, a two character patch. I wonder if this is a record. slaves.sh and stop-all.sh are missing slashes -- Key: HADOOP-11298 URL: https://issues.apache.org/jira/browse/HADOOP-11298 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Priority: Trivial Labels: newbie, shell Attachments: HADOOP-11298.patch Just need to turn dev/null - /dev/null in the cd statement in the preamble. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HADOOP-11298) slaves.sh and stop-all.sh are missing slashes
[ https://issues.apache.org/jira/browse/HADOOP-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer reassigned HADOOP-11298: - Assignee: Allen Wittenauer slaves.sh and stop-all.sh are missing slashes -- Key: HADOOP-11298 URL: https://issues.apache.org/jira/browse/HADOOP-11298 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Priority: Trivial Labels: newbie, shell Attachments: HADOOP-11298.patch Just need to turn dev/null - /dev/null in the cd statement in the preamble. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11278) hadoop-daemon.sh script doesn't hornor --config option
[ https://issues.apache.org/jira/browse/HADOOP-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208996#comment-14208996 ] Allen Wittenauer commented on HADOOP-11278: --- I realize this is closed, but is there an error condition we should be checking for to help prevent this issue in the future? hadoop-daemon.sh script doesn't hornor --config option -- Key: HADOOP-11278 URL: https://issues.apache.org/jira/browse/HADOOP-11278 Project: Hadoop Common Issue Type: Bug Components: bin Affects Versions: 3.0.0 Reporter: Brandon Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11278) hadoop-daemon.sh script doesn't honor --config option
[ https://issues.apache.org/jira/browse/HADOOP-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11278: -- Summary: hadoop-daemon.sh script doesn't honor --config option (was: hadoop-daemon.sh script doesn't hornor --config option) hadoop-daemon.sh script doesn't honor --config option - Key: HADOOP-11278 URL: https://issues.apache.org/jira/browse/HADOOP-11278 Project: Hadoop Common Issue Type: Bug Components: bin Affects Versions: 3.0.0 Reporter: Brandon Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11300) KMS startup scripts must not display the keystore / truststore passwords
[ https://issues.apache.org/jira/browse/HADOOP-11300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209068#comment-14209068 ] Allen Wittenauer commented on HADOOP-11300: --- Umm, given this is a -D setting, doesn't that also mean it's passed on the command line... which in turn means that anyone doing a ps/reading /proc will also see the password? It sounds like this security service has a pretty major security hole... KMS startup scripts must not display the keystore / truststore passwords Key: HADOOP-11300 URL: https://issues.apache.org/jira/browse/HADOOP-11300 Project: Hadoop Common Issue Type: Bug Components: kms Reporter: Arun Suresh Attachments: HADOOP-11300.1.patch Sample output of the KMS startup scripts : {noformat} Setting KMS_HOME: /usr/lib/hadoop-kms Using KMS_CONFIG:/var/run/kms-config/ Using KMS_LOG: /var/log/kms-log Using KMS_TEMP: /var/run/kms-tmp/ Using KMS_HTTP_PORT: 16000 Using KMS_ADMIN_PORT: 16001 Using KMS_MAX_THREADS: 250 Using KMS_SSL_KEYSTORE_FILE: /etc/conf/kms-keystore.jks Using KMS_SSL_KEYSTORE_PASS: keystorepass Using CATALINA_BASE: /var/lib/kms/tomcat-deployment Using KMS_CATALINA_HOME: /usr/lib/hadoop-kms/lib/bigtop-tomcat Setting CATALINA_OUT:/var/log/kms-log/kms-catalina.out Setting CATALINA_PID:/tmp/kms.pid Using CATALINA_OPTS: . -Djavax.net.ssl.trustStorePassword=truststorepass Adding to CATALINA_OPTS: -Dkms.home.dir=.. -Dkms.ssl.keystore.pass= keystorepass {noformat} The keystore password and truststore password are in clear text.. which should be masked -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11208) Replace daemon with better name in scripts like hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
[ https://issues.apache.org/jira/browse/HADOOP-11208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11208: -- Affects Version/s: 3.0.0 Replace daemon with better name in scripts like hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs --- Key: HADOOP-11208 URL: https://issues.apache.org/jira/browse/HADOOP-11208 Project: Hadoop Common Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Yongjun Zhang Assignee: Allen Wittenauer Per discussion in HDFS-7204, creating this jira. Thanks [~aw] for the work on HDFS-7204. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11208) Replace daemon with better name in scripts like hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
[ https://issues.apache.org/jira/browse/HADOOP-11208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11208: -- Attachment: HADOOP-11208.patch Changes the local daemon variable to supportdaemonization. Documentation change will be made on the wiki to the shell scripting guide after commit. Replace daemon with better name in scripts like hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs --- Key: HADOOP-11208 URL: https://issues.apache.org/jira/browse/HADOOP-11208 Project: Hadoop Common Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Yongjun Zhang Assignee: Allen Wittenauer Attachments: HADOOP-11208.patch Per discussion in HDFS-7204, creating this jira. Thanks [~aw] for the work on HDFS-7204. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HADOOP-11208) Replace daemon with better name in scripts like hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
[ https://issues.apache.org/jira/browse/HADOOP-11208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer reassigned HADOOP-11208: - Assignee: Allen Wittenauer Replace daemon with better name in scripts like hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs --- Key: HADOOP-11208 URL: https://issues.apache.org/jira/browse/HADOOP-11208 Project: Hadoop Common Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Yongjun Zhang Assignee: Allen Wittenauer Per discussion in HDFS-7204, creating this jira. Thanks [~aw] for the work on HDFS-7204. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11278) hadoop-daemon.sh script doesn't honor --config option
[ https://issues.apache.org/jira/browse/HADOOP-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209171#comment-14209171 ] Allen Wittenauer commented on HADOOP-11278: --- Yeah, --debug isn't passed through and it probably should be. I'm sure that would have helped tremendously! hadoop-daemon.sh script doesn't honor --config option - Key: HADOOP-11278 URL: https://issues.apache.org/jira/browse/HADOOP-11278 Project: Hadoop Common Issue Type: Bug Components: bin Affects Versions: 3.0.0 Reporter: Brandon Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-8989) hadoop fs -find feature
[ https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-8989: - Summary: hadoop fs -find feature (was: hadoop dfs -find feature) hadoop fs -find feature --- Key: HADOOP-8989 URL: https://issues.apache.org/jira/browse/HADOOP-8989 Project: Hadoop Common Issue Type: New Feature Reporter: Marco Nicosia Assignee: Jonathan Allen Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch Both sysadmins and users make frequent use of the unix 'find' command, but Hadoop has no correlate. Without this, users are writing scripts which make heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs -lsr is somewhat taxing on the NameNode, and a really slow experience on the client side. Possibly an in-NameNode find operation would be only a bit more taxing on the NameNode, but significantly faster from the client's point of view? The minimum set of options I can think of which would make a Hadoop find command generally useful is (in priority order): * -type (file or directory, for now) * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments) * -print0 (for piping to xargs -0) * -depth * -owner/-group (and -nouser/-nogroup) * -name (allowing for shell pattern, or even regex?) * -perm * -size One possible special case, but could possibly be really cool if it ran from within the NameNode: * -delete The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow. Lower priority, some people do use operators, mostly to execute -or searches such as: * find / \(-nouser -or -nogroup\) Finally, I thought I'd include a link to the [Posix spec for find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-8989) hadoop fs -find feature
[ https://issues.apache.org/jira/browse/HADOOP-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-8989: - Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) With several +1's, I've committed this to branch-2 and trunk. Thanks! hadoop fs -find feature --- Key: HADOOP-8989 URL: https://issues.apache.org/jira/browse/HADOOP-8989 Project: Hadoop Common Issue Type: New Feature Reporter: Marco Nicosia Assignee: Jonathan Allen Fix For: 2.7.0 Attachments: HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch, HADOOP-8989.patch Both sysadmins and users make frequent use of the unix 'find' command, but Hadoop has no correlate. Without this, users are writing scripts which make heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs -lsr is somewhat taxing on the NameNode, and a really slow experience on the client side. Possibly an in-NameNode find operation would be only a bit more taxing on the NameNode, but significantly faster from the client's point of view? The minimum set of options I can think of which would make a Hadoop find command generally useful is (in priority order): * -type (file or directory, for now) * -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments) * -print0 (for piping to xargs -0) * -depth * -owner/-group (and -nouser/-nogroup) * -name (allowing for shell pattern, or even regex?) * -perm * -size One possible special case, but could possibly be really cool if it ran from within the NameNode: * -delete The hadoop dfs -lsr | hadoop dfs -rm cycle is really, really slow. Lower priority, some people do use operators, mostly to execute -or searches such as: * find / \(-nouser -or -nogroup\) Finally, I thought I'd include a link to the [Posix spec for find|http://www.opengroup.org/onlinepubs/009695399/utilities/find.html] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11298) slaves.sh and stop-all.sh are missing slashes
[ https://issues.apache.org/jira/browse/HADOOP-11298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11298: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks! Committed to trunk. slaves.sh and stop-all.sh are missing slashes -- Key: HADOOP-11298 URL: https://issues.apache.org/jira/browse/HADOOP-11298 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Priority: Trivial Labels: newbie, shell Fix For: 3.0.0 Attachments: HADOOP-11298.patch Just need to turn dev/null - /dev/null in the cd statement in the preamble. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11150) hadoop command should show the reason on failure by invalid COMMAND or CLASSNAME
[ https://issues.apache.org/jira/browse/HADOOP-11150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11150: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) +1 will commit to trunk. Thanks! hadoop command should show the reason on failure by invalid COMMAND or CLASSNAME Key: HADOOP-11150 URL: https://issues.apache.org/jira/browse/HADOOP-11150 Project: Hadoop Common Issue Type: Improvement Components: scripts Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Fix For: 3.0.0 Attachments: HADOOP-11150-0.patch, HADOOP-11150-1.patch hadoop_validate_classname checks whether the classname contains .. It is possible that classname without package is used in some examples or tutorials. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11300) KMS startup scripts must not display the keystore / truststore passwords
[ https://issues.apache.org/jira/browse/HADOOP-11300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14213227#comment-14213227 ] Allen Wittenauer commented on HADOOP-11300: --- This feels extremely fragile, but since it's tomcat there's only so much we can do. :( Hopefully in the future we can dump tomcat and all of its extra baggage/issues. KMS startup scripts must not display the keystore / truststore passwords Key: HADOOP-11300 URL: https://issues.apache.org/jira/browse/HADOOP-11300 Project: Hadoop Common Issue Type: Bug Components: kms Affects Versions: 2.6.0 Reporter: Arun Suresh Assignee: Arun Suresh Attachments: HADOOP-11300.1.patch, HADOOP-11300.2.patch Sample output of the KMS startup scripts : {noformat} Setting KMS_HOME: /usr/lib/hadoop-kms Using KMS_CONFIG:/var/run/kms-config/ Using KMS_LOG: /var/log/kms-log Using KMS_TEMP: /var/run/kms-tmp/ Using KMS_HTTP_PORT: 16000 Using KMS_ADMIN_PORT: 16001 Using KMS_MAX_THREADS: 250 Using KMS_SSL_KEYSTORE_FILE: /etc/conf/kms-keystore.jks Using KMS_SSL_KEYSTORE_PASS: keystorepass Using CATALINA_BASE: /var/lib/kms/tomcat-deployment Using KMS_CATALINA_HOME: /usr/lib/hadoop-kms/lib/bigtop-tomcat Setting CATALINA_OUT:/var/log/kms-log/kms-catalina.out Setting CATALINA_PID:/tmp/kms.pid Using CATALINA_OPTS: . -Djavax.net.ssl.trustStorePassword=truststorepass Adding to CATALINA_OPTS: -Dkms.home.dir=.. -Dkms.ssl.keystore.pass= keystorepass {noformat} The keystore password and truststore password are in clear text.. which should be masked -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-6962) FileSystem.mkdirs(Path, FSPermission) should use the permission for all of the created directories
[ https://issues.apache.org/jira/browse/HADOOP-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528431#comment-13528431 ] Allen Wittenauer commented on HADOOP-6962: -- Is there any reason not to make this a blocker for 2.0 or even 1.2.0? This is really causing us 'out here' a lot of pain and really needs to get fixed. FileSystem.mkdirs(Path, FSPermission) should use the permission for all of the created directories -- Key: HADOOP-6962 URL: https://issues.apache.org/jira/browse/HADOOP-6962 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Owen O'Malley Assignee: Daryn Sharp Attachments: HADOOP-6962.patch Currently, FileSystem.mkdirs only applies the permissions to the last level if it was created. It should be applied to *all* levels that are created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-6962) FileSystem.mkdirs(Path, FSPermission) should use the permission for all of the created directories
[ https://issues.apache.org/jira/browse/HADOOP-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-6962: - Labels: security (was: ) FileSystem.mkdirs(Path, FSPermission) should use the permission for all of the created directories -- Key: HADOOP-6962 URL: https://issues.apache.org/jira/browse/HADOOP-6962 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Owen O'Malley Assignee: Daryn Sharp Labels: security Attachments: HADOOP-6962.patch Currently, FileSystem.mkdirs only applies the permissions to the last level if it was created. It should be applied to *all* levels that are created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9160) Adopt JMX for management protocols
[ https://issues.apache.org/jira/browse/HADOOP-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541180#comment-13541180 ] Allen Wittenauer commented on HADOOP-9160: -- bq. The fsck operation takes a long time to complete, could have lot of output streamed as response for a long time. HDFS-2538 fixes this problem: output is reduced, fsck runs faster, and it's much easier for ops teams to build tools around. From a JMX perspective, it would just need to provide a percentage. I can understand the desire to do JMX. It's a defacto standard supported by many industry tools. That said... If we put admin interfaces in JMX, then we need to be concerned about security. When I last looked at it, JMX requires the use of keystores full of certs in order to handle multiple identities. PKI+keystores means a lot of pain on the ops side of the house. So if we enable JMX for any 'writable' interfaces, we need to have a way to turn it off so that those of us that don't want to go through that pain and still have a secure system can stick with Hadoop RPC/HTTP with GSSAPI/SPNEGO. Adopt JMX for management protocols -- Key: HADOOP-9160 URL: https://issues.apache.org/jira/browse/HADOOP-9160 Project: Hadoop Common Issue Type: Improvement Reporter: Luke Lu Currently we use Hadoop RPC (and some HTTP, notably fsck) for admin protocols. We should consider adopt JMX for future admin protocols, as it's the industry standard for java server management with wide client support. Having an alternative/redundant RPC mechanism is very desirable for admin protocols. I've seen in the past in multiple cases, where NN and/or JT RPC were locked up solid due to various bugs and/or RPC thread pool exhaustion, while HTTP and/or JMX worked just fine. Other desirable benefits include admin protocol backward compatibility and introspectability, which is convenient for a centralized management system to manage multiple Hadoop clusters of different versions. Another notable benefit is that it's much easier to implement new admin commands in JMX (especially with MXBean) than Hadoop RPC, especially in trunk (as well as 0.23+ and 2.x). Since Hadoop RPC doesn't guarantee backward compatibility (probably not ever for branch-1), there are few external tools depending on it. We can keep the old protocols for as long as needed. New commands should be in JMX. The transition can be gradual and backward-compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9164) Add version number and/or library file name to native library for easy tracking
[ https://issues.apache.org/jira/browse/HADOOP-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541181#comment-13541181 ] Allen Wittenauer commented on HADOOP-9164: -- bq. I don't think C/C++ library version numbers are the most interesting things to report, though. The reality is, we very seldom increment those numbers, so just knowing that you're using libhadoop-1.0.0 doesn't give you much information (there were never any other version numbers for that library :\ ) This should probably get fixed as part of this patch. Add version number and/or library file name to native library for easy tracking --- Key: HADOOP-9164 URL: https://issues.apache.org/jira/browse/HADOOP-9164 Project: Hadoop Common Issue Type: Improvement Components: native Affects Versions: 2.0.2-alpha Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HADOOP-9164.v1.patch, HADOOP-9164.v2.patch, HADOOP-9164.v3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-6962) FileSystem.mkdirs(Path, FSPermission) should use the permission for all of the created directories
[ https://issues.apache.org/jira/browse/HADOOP-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-6962: - Priority: Blocker (was: Critical) Fix Version/s: 1.2.0 I'm changing this to a blocker for 1.2.0. This is a pretty major security hole, when one considers that HDFS does permission inheritance. The only real choices appear to be: a) Use 0777 + applied umask (i.e., POSIX) b) Use inherited perms + applied umask (what I remember from the testing we did in Hadoop 0.14/15-ish) I don't view this as a backwards compatibility problem as much as I view this as a regression. I'm fairly confident that at some point in time this was working as intended (option b), but somewhere along the way no one noticed that it was broken. FileSystem.mkdirs(Path, FSPermission) should use the permission for all of the created directories -- Key: HADOOP-6962 URL: https://issues.apache.org/jira/browse/HADOOP-6962 Project: Hadoop Common Issue Type: Bug Components: fs, security Affects Versions: 1.0.4 Reporter: Owen O'Malley Assignee: Daryn Sharp Priority: Blocker Labels: security Fix For: 1.2.0 Attachments: HADOOP-6962.patch Currently, FileSystem.mkdirs only applies the permissions to the last level if it was created. It should be applied to *all* levels that are created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9160) Adopt JMX for management protocols
[ https://issues.apache.org/jira/browse/HADOOP-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558952#comment-13558952 ] Allen Wittenauer commented on HADOOP-9160: -- bq. The users of the protocols are sysadmins and management daemons. As a member of this subset and as mentioned previously, I want the ability to turn off writes to guarantee that JMX can be used as a read-only interface. I'll -1 any patch that doesn't have it. Adopt JMX for management protocols -- Key: HADOOP-9160 URL: https://issues.apache.org/jira/browse/HADOOP-9160 Project: Hadoop Common Issue Type: Improvement Reporter: Luke Lu Attachments: hadoop-9160-demo-branch-1.txt Currently we use Hadoop RPC (and some HTTP, notably fsck) for admin protocols. We should consider adopt JMX for future admin protocols, as it's the industry standard for java server management with wide client support. Having an alternative/redundant RPC mechanism is very desirable for admin protocols. I've seen in the past in multiple cases, where NN and/or JT RPC were locked up solid due to various bugs and/or RPC thread pool exhaustion, while HTTP and/or JMX worked just fine. Other desirable benefits include admin protocol backward compatibility and introspectability, which is convenient for a centralized management system to manage multiple Hadoop clusters of different versions. Another notable benefit is that it's much easier to implement new admin commands in JMX (especially with MXBean) than Hadoop RPC, especially in trunk (as well as 0.23+ and 2.x). Since Hadoop RPC doesn't guarantee backward compatibility (probably not ever for branch-1), there are few external tools depending on it. We can keep the old protocols for as long as needed. New commands should be in JMX. The transition can be gradual and backward-compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-6962) FileSystem.mkdirs(Path, FSPermission) should use the permission for all of the created directories
[ https://issues.apache.org/jira/browse/HADOOP-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-6962: - Target Version/s: 1.2.0 Fix Version/s: (was: 1.2.0) My tests with the current patch did not work on 1.0.4 when running a MapReduce program. FileSystem.mkdirs(Path, FSPermission) should use the permission for all of the created directories -- Key: HADOOP-6962 URL: https://issues.apache.org/jira/browse/HADOOP-6962 Project: Hadoop Common Issue Type: Bug Components: fs, security Affects Versions: 1.0.4 Reporter: Owen O'Malley Assignee: Daryn Sharp Priority: Blocker Labels: security Attachments: HADOOP-6962.patch Currently, FileSystem.mkdirs only applies the permissions to the last level if it was created. It should be applied to *all* levels that are created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9296) Authenticating users from different realm without a trust relationship
[ https://issues.apache.org/jira/browse/HADOOP-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577796#comment-13577796 ] Allen Wittenauer commented on HADOOP-9296: -- How does this work when multiple grids are involved? i.e. distcp Authenticating users from different realm without a trust relationship -- Key: HADOOP-9296 URL: https://issues.apache.org/jira/browse/HADOOP-9296 Project: Hadoop Common Issue Type: Improvement Components: security Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HADOOP-9296-1.1.patch, multirealm.pdf Hadoop Masters (JobTracker and NameNode) and slaves (Data Node and TaskTracker) are part of the Hadoop domain, controlled by Hadoop Active Directory. The users belong to the CORP domain, controlled by the CORP Active Directory. In the absence of a one way trust from HADOOP DOMAIN to CORP DOMAIN, how will Hadoop Servers (JobTracker, NameNode) authenticate CORP users ? The solution and implementation details are in the attachement -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9317) User cannot specify a kerberos keytab for commands
[ https://issues.apache.org/jira/browse/HADOOP-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582660#comment-13582660 ] Allen Wittenauer commented on HADOOP-9317: -- Maybe I'm missing something, but I don't understand why just using a different KRB5CCNAME for every invocation doesn't fix this. i.e., program flow should be: {code} export KRB5CCNAME=/tmp/mycoolcache.$$ kinit -k -t keytab identity hadoop jar blah rm /tmp/mycookcache.$$ {code} You could even be smarter and check the creation timestamp vs. expiry. Additionally, I'm not sure, but I don't think kinit -R removes the file. (But I could be wrong.) User cannot specify a kerberos keytab for commands -- Key: HADOOP-9317 URL: https://issues.apache.org/jira/browse/HADOOP-9317 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HADOOP-9317.branch-23.patch, HADOOP-9317.branch-23.patch, HADOOP-9317.patch, HADOOP-9317.patch, HADOOP-9317.patch {{UserGroupInformation}} only allows kerberos users to be logged in via the ticket cache when running hadoop commands. {{UGI}} allows a keytab to be used, but it's only exposed programatically. This forces keytab-based users running hadoop commands to periodically issue a kinit from the keytab. A race condition exists during the kinit when the ticket cache is deleted and re-created. Hadoop commands will fail when the ticket cache does not momentarily exist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9296) Authenticating users from different realm without a trust relationship
[ https://issues.apache.org/jira/browse/HADOOP-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582674#comment-13582674 ] Allen Wittenauer commented on HADOOP-9296: -- After more thought, as far as I can tell, doesn't actually do anything to protect the web interfaces for the TaskTracker or the DataNode. I'm guessing this is built around the idea that something else is protecting those or the user will always connect to the JT or NN first in order to get a delegation token? Also, how does SPNEGO for the NN/2NN work under this scenario? Will the hdfs user need to come from the user realm as well? I recognize this is a kludge for broken company policies and politics who for whatever reasons aren't willing to do Kerberos properly with a one-way trust. But I'm worried this is going to give a false sense of security without making sure that other things are in place. At the minimum, the documentation accompanying this change should be explicit about its use cases and promote the usage of real trusts. Authenticating users from different realm without a trust relationship -- Key: HADOOP-9296 URL: https://issues.apache.org/jira/browse/HADOOP-9296 Project: Hadoop Common Issue Type: Improvement Components: security Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HADOOP-9296-1.1.patch, multirealm.pdf Hadoop Masters (JobTracker and NameNode) and slaves (Data Node and TaskTracker) are part of the Hadoop domain, controlled by Hadoop Active Directory. The users belong to the CORP domain, controlled by the CORP Active Directory. In the absence of a one way trust from HADOOP DOMAIN to CORP DOMAIN, how will Hadoop Servers (JobTracker, NameNode) authenticate CORP users ? The solution and implementation details are in the attachement -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9520) _HOST doesn't resolve to bound interface
Allen Wittenauer created HADOOP-9520: Summary: _HOST doesn't resolve to bound interface Key: HADOOP-9520 URL: https://issues.apache.org/jira/browse/HADOOP-9520 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer _HOST appears to ignore bound interfaces. For example, if a host has two interfaces such that: nic0 = gethostname() nic1 = someothername and then I configure the namenode or resource manager to use someothername:, the system still treats _HOST = nic0. This is especially harmful for Kerberos principals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9521) krb5 replay error triggers log file DoS with Safari
Allen Wittenauer created HADOOP-9521: Summary: krb5 replay error triggers log file DoS with Safari Key: HADOOP-9521 URL: https://issues.apache.org/jira/browse/HADOOP-9521 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer Priority: Blocker While investigating YARN-621, looking at the web interface with Safari triggered a loop which both filled the log with stack traces as well as left the browser in a continual loading situation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9521) krb5 replay error triggers log file DoS with Safari
[ https://issues.apache.org/jira/browse/HADOOP-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9521: - Component/s: security Environment: Mac OS X 10.8.3, Safari 6.0.3 (8536.28.10) krb5 replay error triggers log file DoS with Safari --- Key: HADOOP-9521 URL: https://issues.apache.org/jira/browse/HADOOP-9521 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.0.4-alpha Environment: Mac OS X 10.8.3, Safari 6.0.3 (8536.28.10) Reporter: Allen Wittenauer Priority: Blocker While investigating YARN-621, looking at the web interface with Safari triggered a loop which both filled the log with stack traces as well as left the browser in a continual loading situation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9522) web interfaces are not logged until after opening
Allen Wittenauer created HADOOP-9522: Summary: web interfaces are not logged until after opening Key: HADOOP-9522 URL: https://issues.apache.org/jira/browse/HADOOP-9522 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer If one mis-configures certain interfaces (in my case yarn.resourcemanager.webapp.address), neither Hadoop nor jetty throws any errors that the interface doesn't exist. Worse yet, the system appears to be hung. It would be better if we logged what hostname:port we were attempting to open before we opened it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9520) _HOST doesn't resolve to bound interface
[ https://issues.apache.org/jira/browse/HADOOP-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645913#comment-13645913 ] Allen Wittenauer commented on HADOOP-9520: -- FWIW, I'm fully expecting to fix this bug myself like I did for our branch-1 install. People wanted me to file bugs. I did and got the fully expected push back that ops teams are required to hard code everything (despite this being completely unnecessary and mostly unintuitive vs ~4 code change). _HOST doesn't resolve to bound interface Key: HADOOP-9520 URL: https://issues.apache.org/jira/browse/HADOOP-9520 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer _HOST appears to ignore bound interfaces. For example, if a host has two interfaces such that: nic0 = gethostname() nic1 = someothername and then I configure the namenode or resource manager to use someothername:, the system still treats _HOST = nic0. This is especially harmful for Kerberos principals. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9521) krb5 replay error triggers log file DoS with Safari
[ https://issues.apache.org/jira/browse/HADOOP-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645929#comment-13645929 ] Allen Wittenauer commented on HADOOP-9521: -- I don't have anything but 6.0.3 here to test against. Stack trace is what you'd expect: {code} 2013-04-30 19:58:54,576 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: Authentication exception: GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34)) org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34)) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:329) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:349) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:384) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1069) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34)) at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:741) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:323) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:267) at sun.security.jgss.spnego.SpNegoContext.GSS_acceptSecContext(SpNegoContext.java:874) at sun.security.jgss.spnego.SpNegoContext.acceptSecContext(SpNegoContext.java:541) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:323) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:267) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:299) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:291) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:291) ... 25 more Caused by: KrbException: Request is a replay (34) at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:298) at sun.security.krb5.KrbApReq.init(KrbApReq.java:134) at sun.security.jgss.krb5.InitSecContextToken.init(InitSecContextToken.java:79) at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:724) ... 36 more {code} krb5 replay error triggers log file DoS with Safari --- Key: HADOOP-9521
[jira] [Updated] (HADOOP-9521) krb5 replay error triggers log file DoS with Safari
[ https://issues.apache.org/jira/browse/HADOOP-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9521: - Environment: Mac OS X 10.8.3, Safari 6.0.3 (8536.28.10) Mac OS X 10.6.8, Safari 6.0.3 (8536.28.10) was:Mac OS X 10.8.3, Safari 6.0.3 (8536.28.10) krb5 replay error triggers log file DoS with Safari --- Key: HADOOP-9521 URL: https://issues.apache.org/jira/browse/HADOOP-9521 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.0.4-alpha Environment: Mac OS X 10.8.3, Safari 6.0.3 (8536.28.10) Mac OS X 10.6.8, Safari 6.0.3 (8536.28.10) Reporter: Allen Wittenauer Priority: Blocker While investigating YARN-621, looking at the web interface with Safari triggered a loop which both filled the log with stack traces as well as left the browser in a continual loading situation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9710) Modify security layer to support QoP based on ports
[ https://issues.apache.org/jira/browse/HADOOP-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9710: - Description: Hadoop Servers currently support only one quality of protection(QOP) for all of the cluster. This jira allows a server to have different QOP on different ports. The QOP is set based on the port. was: Hadoop Servers currently support only one QOP for all of the cluster. This jira allows a server to have different QOP on different ports. The QOP is set based on the port. Summary: Modify security layer to support QoP based on ports (was: Modify security layer to support QOP based on ports) Modify security layer to support QoP based on ports Key: HADOOP-9710 URL: https://issues.apache.org/jira/browse/HADOOP-9710 Project: Hadoop Common Issue Type: Improvement Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HADOOP-9710.patch Hadoop Servers currently support only one quality of protection(QOP) for all of the cluster. This jira allows a server to have different QOP on different ports. The QOP is set based on the port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9777) RPM should not claim ownership of paths owned by the platform
[ https://issues.apache.org/jira/browse/HADOOP-9777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9777: - Priority: Critical (was: Major) RPM should not claim ownership of paths owned by the platform - Key: HADOOP-9777 URL: https://issues.apache.org/jira/browse/HADOOP-9777 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 1.1.2 Environment: Fedora 19 x64 Reporter: Stevo Slavic Priority: Critical Installing Apache Hadoop rpm ( hadoop-1.1.2-1.x86_64.rpm ) on Fedora 19 x64 fails with: [root@laptop hadoop]# rpm -i /home/sslavic/Downloads/hadoop-1.1.2-1.x86_64.rpm file /usr/bin from install of hadoop-1.1.2-1.x86_64 conflicts with file from package filesystem-3.2-12.fc19.x86_64 file /usr/lib from install of hadoop-1.1.2-1.x86_64 conflicts with file from package filesystem-3.2-12.fc19.x86_64 file /usr/lib64 from install of hadoop-1.1.2-1.x86_64 conflicts with file from package filesystem-3.2-12.fc19.x86_64 file /usr/sbin from install of hadoop-1.1.2-1.x86_64 conflicts with file from package filesystem-3.2-12.fc19.x86_64 Same issue occurs if one tries to install as non-root user: [sslavic@laptop ~]$ sudo rpm -i Downloads/hadoop-1.1.2-1.x86_64.rpm file /usr/bin from install of hadoop-1.1.2-1.x86_64 conflicts with file from package filesystem-3.2-12.fc19.x86_64 file /usr/lib from install of hadoop-1.1.2-1.x86_64 conflicts with file from package filesystem-3.2-12.fc19.x86_64 file /usr/lib64 from install of hadoop-1.1.2-1.x86_64 conflicts with file from package filesystem-3.2-12.fc19.x86_64 file /usr/sbin from install of hadoop-1.1.2-1.x86_64 conflicts with file from package filesystem-3.2-12.fc19.x86_64 It seems these 4 directories in Hadoop rpm have wrong permissions (+w for owner). This is violation of packaging rules. Hadoop rpm spec and/or build scripts need to be fixed, so that rpm on installation doesn't try to claim ownership of paths owned by the platform, in this case, filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9870) Mixed configurations for JVM -Xmx in hadoop command
[ https://issues.apache.org/jira/browse/HADOOP-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739896#comment-13739896 ] Allen Wittenauer commented on HADOOP-9870: -- Is there something inherently wrong with letting the JVM make the decision? Are we worried about a JVM that doesn't follow the same set of rules? (which, at this point, is a de facto API) Mixed configurations for JVM -Xmx in hadoop command --- Key: HADOOP-9870 URL: https://issues.apache.org/jira/browse/HADOOP-9870 Project: Hadoop Common Issue Type: Bug Reporter: Wei Yan When we use hadoop command to launch a class, there are two places setting the -Xmx configuration. *1*. The first place is located in file {{hadoop-common-project/hadoop-common/src/main/bin/hadoop}}. {code} exec $JAVA $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS $@ {code} Here $JAVA_HEAP_MAX is configured in hadoop-config.sh ({{hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh}}). The default value is -Xmx1000m. *2*. The second place is set with $HADOOP_OPTS in file {{hadoop-common-project/hadoop-common/src/main/bin/hadoop}}. {code} HADOOP_OPTS=$HADOOP_OPTS $HADOOP_CLIENT_OPTS {code} Here $HADOOP_CLIENT_OPTS is set in hadoop-env.sh ({{hadoop-common-project/hadoop-common/src/main/conf/hadoop-env.sh}}) {code} export HADOOP_CLIENT_OPTS=-Xmx512m $HADOOP_CLIENT_OPTS {code} Currently the final default java command looks like: {code}java -Xmx1000m -Xmx512m CLASS_NAME ARGUMENTS{code} And if users also specify the -Xmx in the $HADOOP_CLIENT_OPTS, there will be three -Xmx configurations. The hadoop setup tutorial only discusses hadoop-env.sh, and it looks that users should not make any change in hadoop-config.sh. We should let hadoop smart to choose the right one before launching the java command, instead of leaving for jvm to make the decision. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9874) hadoop.security.logger output goes to both logs
[ https://issues.apache.org/jira/browse/HADOOP-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740473#comment-13740473 ] Allen Wittenauer commented on HADOOP-9874: -- I'm fairly certain this is a regression as well, but I can't verify that at the moment. hadoop.security.logger output goes to both logs --- Key: HADOOP-9874 URL: https://issues.apache.org/jira/browse/HADOOP-9874 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer Setting hadoop.security.logger (for SecurityLogger messages) to non-null sends authentication information to the other log as specified. However, that logging information also goes to the main log. It should only go to one log, not both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9874) hadoop.security.logger output goes to both logs
Allen Wittenauer created HADOOP-9874: Summary: hadoop.security.logger output goes to both logs Key: HADOOP-9874 URL: https://issues.apache.org/jira/browse/HADOOP-9874 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer Setting hadoop.security.logger (for SecurityLogger messages) to non-null sends authentication information to the other log as specified. However, that logging information also goes to the main log. It should only go to one log, not both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9874) hadoop.security.logger output goes to both logs
[ https://issues.apache.org/jira/browse/HADOOP-9874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740504#comment-13740504 ] Allen Wittenauer commented on HADOOP-9874: -- Sure, but we should do the correct thing out of the box. hadoop.security.logger output goes to both logs --- Key: HADOOP-9874 URL: https://issues.apache.org/jira/browse/HADOOP-9874 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer Setting hadoop.security.logger (for SecurityLogger messages) to non-null sends authentication information to the other log as specified. However, that logging information also goes to the main log. It should only go to one log, not both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9884) Hadoop calling du -sk is expensive
[ https://issues.apache.org/jira/browse/HADOOP-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744184#comment-13744184 ] Allen Wittenauer commented on HADOOP-9884: -- We need to tread carefully here. Replacing the du call has the potential to break distributed cache (and probably other things), especially for non-HDFS-based systems. Hadoop calling du -sk is expensive -- Key: HADOOP-9884 URL: https://issues.apache.org/jira/browse/HADOOP-9884 Project: Hadoop Common Issue Type: Improvement Reporter: Alex Newman On numerous occasions we've had customers worry about slowness while hadoop calls du -sk underneath the hood. For most of these users getting the information from df would be sufficient and much faster. In fact there is a hack going around, that is quiet common that replaces df with du. Sometimes people have to tune the vcache. What if we just allowed users to use the df information instead of the du information with a patch and config setting. I'd be glad to code it up -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9902) Shell script rewrite
Allen Wittenauer created HADOOP-9902: Summary: Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Umbrella JIRA for shell script rewrite. See first comment for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Description: Umbrella JIRA for shell script rewrite. See more-info.txt for more details. (was: Umbrella JIRA for shell script rewrite. See first comment for more details.) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: more-info.txt Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: scripts.tgz Just to give an idea of what I'm thinking, here is a sample. Note this is a) not even close to final, b) likely has bugs, c) is very incomplete, and d) hasn't been fully optimized at all. This is for 2.1.0. Sorry for not being in patch format, but I'm not at that stage yet. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt, scripts.tgz Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749657#comment-13749657 ] Allen Wittenauer commented on HADOOP-9902: -- Adding a bunch of links to JIRAs for xref to when various things got added. A quick read leaves me with one impression: YARN is incredibly inconsistent and it's attempts to make things easier have actually made things harder for both the user and the developer. Worse, a lot of the stuff is completely undocumented outside of JIRAs. I don't know if this situation is salvageable without undoing some of this nonsense. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt, scripts.tgz Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750154#comment-13750154 ] Allen Wittenauer commented on HADOOP-9902: -- Question for the crowd. In bin/yarn is... this: {code} # for developers, add Hadoop classes to CLASSPATH if [ -d $HADOOP_YARN_HOME/yarn-api/target/classes ]; then CLASSPATH=${CLASSPATH}:$HADOOP_YARN_HOME/yarn-api/target/classes fi if [ -d $HADOOP_YARN_HOME/yarn-common/target/classes ]; then CLASSPATH=${CLASSPATH}:$HADOOP_YARN_HOME/yarn-common/target/classes fi if [ -d $HADOOP_YARN_HOME/yarn-mapreduce/target/classes ]; then CLASSPATH=${CLASSPATH}:$HADOOP_YARN_HOME/yarn-mapreduce/target/classes fi if [ -d $HADOOP_YARN_HOME/yarn-master-worker/target/classes ]; then CLASSPATH=${CLASSPATH}:$HADOOP_YARN_HOME/yarn-master-worker/target/classes fi if [ -d $HADOOP_YARN_HOME/yarn-server/yarn-server-nodemanager/target/classes ]; then CLASSPATH=${CLASSPATH}:$HADOOP_YARN_HOME/yarn-server/yarn-server-nodemanager/target/classes fi if [ -d $HADOOP_YARN_HOME/yarn-server/yarn-server-common/target/classes ]; then CLASSPATH=${CLASSPATH}:$HADOOP_YARN_HOME/yarn-server/yarn-server-common/target/classes fi if [ -d $HADOOP_YARN_HOME/yarn-server/yarn-server-resourcemanager/target/classes ]; then CLASSPATH=${CLASSPATH}:$HADOOP_YARN_HOME/yarn-server/yarn-server-resourcemanager/target/classes fi if [ -d $HADOOP_YARN_HOME/build/test/classes ]; then CLASSPATH=${CLASSPATH}:$HADOOP_YARN_HOME/target/test/classes fi if [ -d $HADOOP_YARN_HOME/build/tools ]; then CLASSPATH=${CLASSPATH}:$HADOOP_YARN_HOME/build/tools fi {code} [I'm pretty sure at this point in the execution path, the YARN jars from the non-build directories have already been inserted into the classpath via the early call into hadoop-config.sh... which means this code likely isn't working as intended. For now, let's assume that it is.] After cleanup, it looks a bit more like this, using the before option to push the entries to the front of the classpath and reversing to maintain the pathing order. [altho I suspect that a) we can this down even further by an ls -d and b) ordering doesn't matter]: {code} add_classpath $HADOOP_YARN_HOME/build/tools before add_classpath $HADOOP_YARN_HOME/build/test/classes before for debugpath in yarn-server-resourcemanager yarn-server-common yarn-server-nodemanager \ yarn-master-worker yarn-mapreduce yarn-common yarn-api; do add_classpath $HADOOP_YARN_HOME/$debugpath/target/classes before done {code} Since this is buried in bin/yarn, this is only getting set if the yarn command is being used. This might lead to some interesting situations where we're running test yarn code on stable HDFS. This may or may not be desirable. So now the question: *Should test classpaths always be inserted if we detect them?* Your choices: a) We actually cover this as part of the unit tests. Strip all this stuff out so our commands run faster! b) Keep the debug code per-section. i.e., hdfs command will only get hdfs and common test code, yarn command will get the yarn and common test code, hadoop command only gets common. c) Everyone gets everything. i.e., using the hdfs command will add in the yarn test code. Reminder: hadoop-config.sh adds in *all* of the classpaths we know about. I don't think this is fixable without breaking compatibility in a major way. (Changing the 'hadoop classpath' command to show all paths is certainly do-able but who knows what *else* would break...) Thoughts? Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt, scripts.tgz Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750233#comment-13750233 ] Allen Wittenauer commented on HADOOP-9902: -- or a 4th option: d) set HADOOP_BUILD_DEBUG=sub sub ... which would only enable the classpath for the subprojects listed. (i.e., HADOOP_BUILD_DEBUG=hdfs yarn would enable both hdfs and yarn but not common. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt, scripts.tgz Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756796#comment-13756796 ] Allen Wittenauer commented on HADOOP-9902: -- Digging into this further, it looks like YARN has a different build structure than HDFS, common, and mapreduce, which is why these extra classpaths aren't added. I'll see if I can work out what should be added and wrap them around a new flag (--buildpaths). Thanks! Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt, scripts.tgz Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: scripts.tgz Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt, scripts.tgz, scripts.tgz Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: (was: scripts.tgz) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt, scripts.tgz Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757341#comment-13757341 ] Allen Wittenauer commented on HADOOP-9902: -- Uploaded another mostly untested code drop with contents of bin/ and libexec/ to show progress, get some feedback, etc. Basic stuff does appear to work for me, but I haven't tried starting any daemons yet since I'm still working out the new secure DN starter code to be much more flexible. Plus I'm still working my way through sbin. A few things worth pointing out: Load order should be consistent now. Basic path is: * bin/command sets HADOOP_NEW_CONFIG to disable auto-population. It then loads: ** xyz-config.sh *** hadoop-config.sh hadoop-env.sh hadoop-functions.sh *** xyz-env.sh - loading this here should allow for users to override quite a bit more, at least that's the hypothesis * (do whatever) * finalize - fills in any missing -D's * exec java This mainly has implications for YARN which did/does really oddball things with YARN_OPTS. There is bound to be some (edge-case?) breakage here, but (IMO) consistency is more important. I tried to 'make it work', but... Misc. * users can override functions in hadoop-env.sh. This means if they need extra/replacement functionality, totally doable, without replacing anything in libexec. I might make a specific call out * double-dash options (i.e., --config) are handled by the same code, consistently, in hadoop-config.sh. Also, since this is a loop, the order of the options no longer matters, except for --config (for what are hopefully obvious reasons). --help and friends work by having the top level define a function called usage(). * Most/all of the crazy if/fi constructions (esp those buried inside a case!) have been replaced with a single-parent case statement. Also, an effort has been made to mostly alphabetize the commands in the case statement, although I'm sure I missed one or two. * Option C from above has been implemented. I think. ;) * I haven't touched httpfs yet at all. * You can see some previews of some of the stuff in sbin. For example, slaves.sh now uses pdsh if it is installed. * LD_LIBRARY_PATH, CLASSPATH, JAVA_LIBRARY_PATH are now de-duped. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt, scripts.tgz Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757349#comment-13757349 ] Allen Wittenauer commented on HADOOP-9902: -- Oh, one other thing: * removed rm-config/log4j.properties and nm-config/log4j.properties support. These appear to be completely undocumented. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt, scripts.tgz Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763221#comment-13763221 ] Allen Wittenauer commented on HADOOP-9902: -- Would anyone miss any of the following YARN properties being defined: * yarn.id.str * yarn.home.dir * yarn.policy.file None of these are used in the Hadoop source and don't appear to be documented. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: more-info.txt, scripts.tgz Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768713#comment-13768713 ] Allen Wittenauer commented on HADOOP-9902: -- Since I'm getting ready to post a patch, how about an 'end result' example! Here is the comamnd line for the resource manager from my real, 100+ node test grid. Before the changes: {code} /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Xmx24g -Dyarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log -Dyarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY -Xloggc:/export/apps/hadoop/logs/gc-nn.log-201308261726 -Dcom.sun.management.jmxremote.port=9010 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dhadoop.log.dir=/export/apps/hadoop/logs -Dyarn.log.dir=/export/apps/hadoop/logs -Dhadoop.log.file=yarn-yarn-resourcemanager-eat1-hcl4083.grid.linkedin.com.log -Dyarn.log.file=yarn-yarn-resourcemanager-eat1-hcl4083.grid.linkedin.com.log -Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,DRFA -Dyarn.root.logger=INFO,DRFA -Djava.library.path=/export/apps/hadoop/latest/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/export/apps/hadoop/logs -Dyarn.log.dir=/export/apps/hadoop/logs -Dhadoop.log.file=yarn-yarn-resourcemanager-eat1-hcl4083.grid.linkedin.com.log -Dyarn.log.file=yarn-yarn-resourcemanager-eat1-hcl4083.grid.linkedin.com.log -Dyarn.home.dir=/export/apps/hadoop/latest -Dhadoop.home.dir=/export/apps/hadoop/latest -Dhadoop.root.logger=INFO,DRFA -Dyarn.root.logger=INFO,DRFA -Djava.library.path=/export/apps/hadoop/latest/lib/native -classpath /export/apps/hadoop/site/etc/hadoop /export/apps/hadoop/site/etc/hadoop /export/apps/hadoop/site/etc/hadoop /export/apps/hadoop/latest/share/hadoop/common/lib/* /export/apps/hadoop/latest/share/hadoop/common/* /export/apps/hadoop/latest/share/hadoop/hdfs /export/apps/hadoop/latest/share/hadoop/hdfs/lib/* /export/apps/hadoop/latest/share/hadoop/hdfs/* /export/apps/hadoop/latest/share/hadoop/yarn/lib/* /export/apps/hadoop/latest/share/hadoop/yarn/* /export/apps/hadoop/latest/share/hadoop/mapreduce/lib/* /export/apps/hadoop/latest/share/hadoop/mapreduce/* /export/apps/hadoop/site/lib/grid-topology-1.0.jar /export/apps/hadoop/latest/contrib/capacity-scheduler/*.jar /export/apps/hadoop/site/lib/grid-topology-1.0.jar /export/apps/hadoop/latest/contrib/capacity-scheduler/*.jar /export/apps/hadoop/site/lib/grid-topology-1.0.jar /export/apps/hadoop/latest/contrib/capacity-scheduler/*.jar /export/apps/hadoop/latest/share/hadoop/yarn/* /export/apps/hadoop/latest/share/hadoop/yarn/lib/* /export/apps/hadoop/site/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager {code} After the changes: {code} /usr/java/default/bin/java -Dproc_resourcemanager -Xloggc:/export/apps/hadoop/logs/gc-nn.log-201309162014 -Dcom.sun.management.jmxremote.port=9010 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Xmx24g -Dyarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log -Dyarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY -Dyarn.log.dir=/export/apps/hadoop/logs -Dyarn.log.file=yarn-yarn-resourcemanager-eat1-hcl4083.grid.linkedin.com.log -Dyarn.home.dir=/export/apps/hadoop/latest -Dyarn.root.logger=INFO,DRFA -Djava.library.path=/export/apps/hadoop/latest/lib/native -Dhadoop.log.dir=/export/apps/hadoop/logs -Dhadoop.log.file=yarn-yarn-resourcemanager-eat1-hcl4083.grid.linkedin.com.log -Dhadoop.home.dir=/export/apps/hadoop/latest -Dhadoop.id.str=yarn -Dhadoop.root.logger=INFO,DRFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender -Djava.net.preferIPv4Stack=true -classpath /export/apps/hadoop/site/lib/grid-topology-1.0.jar /export/apps/hadoop/latest/contrib/capacity-scheduler/*.jar /export/apps/hadoop/site/etc/hadoop /export/apps/hadoop/latest/share/hadoop/common/lib/* /export/apps/hadoop/latest/share/hadoop/common/* /export/apps/hadoop/latest/share/hadoop/hdfs /export/apps/hadoop/latest/share/hadoop/hdfs/lib/* /export/apps/hadoop/latest/share/hadoop/hdfs/* /export/apps/hadoop/latest/share/hadoop/yarn/lib/* /export/apps/hadoop/latest/share/hadoop/yarn/* /export/apps/hadoop/latest/share/hadoop/mapreduce/lib/* /export/apps/hadoop/latest/share/hadoop/mapreduce/* org.apache.hadoop.yarn.server.resourcemanager.ResourceManager {code} 2500 bytes vs. 1750 bytes, almost all the savings are from the classpath. There are still a few problems with the 'after' output but... they are mainly from my local config and not coming from the scripts. :) Shell script rewrite
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: (was: scripts.tgz) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Attachment: hadoop-9902-1.patch Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768889#comment-13768889 ] Allen Wittenauer commented on HADOOP-9902: -- Removed the tarball. Added a patch. This still needs a lot of testing and some of the features aren't quite complete (start-dfs.sh firing off secure datanodes, for example). httpfs hasn't been touched. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784664#comment-13784664 ] Allen Wittenauer commented on HADOOP-9902: -- These are sort of out of order. bq. playing with this. sometimes the generated classpath is , say. share/hadoop/yarn/* ; the capacity scheduler is /*.jar -should everything be consistent. At one point I thought about processing the regex string to dedupe it down to the jar level. This opens up a big can of worms, however: if you hit two of them, do you always take the latest? What does latest mean anyway (date or version)? Will we be able to parse the version out of the filename? How do we deal with user overrides? Still take the latest no matter what? I've opted to basically let the classpath as it is passed to us stand. Currently the dedupe code is pretty fast for interpreted shell. :) The *only* sub-optimization that I might be tempted to do is to normalize any symlinks and relative paths. There is a good chance we'll catch a few dupes this way... but it likely isn't worth the extra execution time. It's worth pointing out that a user can feasibly replace the add_classpath code in hadoop-env.sh to override the functionality without changing the base Apache code if they want/need more advanced classpath handling. (e.g., HADOOP-6997 seems to be a non-issue to me since passing duplicate class names is just bad practice; changing the collation is fixing a symptom of a much bigger/dangerous problem. But someone facing this issue could theoretically fix a collation problem on their own, legally in a stable way using this trick.) bq. I don't see hadoop tools getting on the CP: is there a plan for that? Tools path gets added as needed. I seem to recall this is exactly the same way in the current shell scripts. bq. Because it would suit me to have a directory into which I could put things to get them on a classpath without playing with HADOOP_CLASSPATH I was planning on bringing up this exact issue after I get this one committed. It's a harder discussion because the placement is tricky and there are a lot of options to make this functionality happen. Do we add another env var? Do we just auto-prepend $HADOOP_PREFIX/lib/share/site/*? Do we offer both prepend and append options? etc etc. All have pro's and con's. Some of the choices become feasible really only after this is committed, however. bq. we do need to think when and how to react to (conf dir) absence Good point. That's pretty easy to add given that the conf dir handling is fairly well contained now in the hadoop_find_confdir function in hadoop-functions.sh. It's pretty trivial to throw a fatal error if we don't detect, say, hadoop-env.sh in what we resolved HADOOP_CONF_DIR to. Suggestions on what to check for? bq. actually a rebuild fixes that. What I did have to do was drop hadoop-functions.sh into libexec Yeah, after commit this is pretty much a flag day for all of the Hadoop subprojects. I talked to a few folks about it and it was generally felt that this should be one big patch+JIRA rather than several smaller ones per project given the interdependency on common. We'll have to advertise on the various -dev mailing lists post commit to say do a full rebuild. Hopefully folks won't have to change their *-env.sh files and they will continue without modification, however. Thanks! Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785489#comment-13785489 ] Allen Wittenauer commented on HADOOP-9902: -- Agreed. The edge cases are too painful. The only dupe jar detection that occurs now is some extremely simple string match. So if someone does something like $DIR/lib/blah.jar and $DIR/lib/../lib/blah.jar, it won't get deduped. (It does, however, verify that $DIR/lib and $DIR/lib/../lib exists!) Even with just this simple stuff, it eliminates multiple instances of the conf dir at a minimum. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HADOOP-10034) optimize same-filesystem symlinks by doing resolution server-side
[ https://issues.apache.org/jira/browse/HADOOP-10034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791005#comment-13791005 ] Allen Wittenauer commented on HADOOP-10034: --- Won't doing this preclude us ever adding real relative paths into HDFS? i.e., supporting .. optimize same-filesystem symlinks by doing resolution server-side - Key: HADOOP-10034 URL: https://issues.apache.org/jira/browse/HADOOP-10034 Project: Hadoop Common Issue Type: Sub-task Components: fs Reporter: Colin Patrick McCabe We should optimize same-filesystem symlinks by doing resolution server-side rather than client side, as discussed on HADOOP-9780. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877702#comment-13877702 ] Allen Wittenauer commented on HADOOP-9902: -- Adding a link to HADOOP-10177 and HDFS-4763 to include the changes added by those patches. (It should be noted that neither patch listed included CLI help info for the new sub-commands they added...) Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877709#comment-13877709 ] Allen Wittenauer commented on HADOOP-9902: -- Would anyone be too upset for a patch to trunk that removed the 'deprecated' status? i.e., no longer warning, etc? It'll have been in a release that we no longer support the HDFS and MR sub-commands. Shell script rewrite Key: HADOOP-9902 URL: https://issues.apache.org/jira/browse/HADOOP-9902 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 2.1.1-beta Reporter: Allen Wittenauer Assignee: Allen Wittenauer Attachments: hadoop-9902-1.patch, more-info.txt Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-7476) task-controller can drop last char from config file
[ https://issues.apache.org/jira/browse/HADOOP-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067978#comment-13067978 ] Allen Wittenauer commented on HADOOP-7476: -- While working on porting task-controller, I ran into getline(): {code} size_read = getline(line,linesize,conf_file); //feof returns true only after we read past EOF. //so a file with no new line, at last can reach this place //if size_read returns negative check for eof condition if (size_read == -1) { if(!feof(conf_file)){ fprintf(LOGFILE, getline returned error.\n); exit(INVALID_CONFIG_FILE); }else { free(line); break; } } //trim the ending new line line[strlen(line)-1] = '\0'; //comment line {code} My read of this code says that we always remove the last character of the buffer prior to the null termination. In the vast majority of cases, this should be \N. However, getline() doesn't appear to guarantee this: The buffer is null-terminated and includes the newline character, if one was found. If the configuration file was built in such a way that it does not end with a newline, it will chop off the last character. task-controller can drop last char from config file --- Key: HADOOP-7476 URL: https://issues.apache.org/jira/browse/HADOOP-7476 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.20.203.0 Reporter: Allen Wittenauer Priority: Trivial It looks as though task-controller's configuration file reader assumes that the output of getline() always ends with \n\0. This assumption does not appear to be safe. See comments for more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7476) task-controller can drop last char from config file
[ https://issues.apache.org/jira/browse/HADOOP-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067988#comment-13067988 ] Allen Wittenauer commented on HADOOP-7476: -- sure is, missed that one. thanks. task-controller can drop last char from config file --- Key: HADOOP-7476 URL: https://issues.apache.org/jira/browse/HADOOP-7476 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.20.203.0 Reporter: Allen Wittenauer Priority: Trivial It looks as though task-controller's configuration file reader assumes that the output of getline() always ends with \n\0. This assumption does not appear to be safe. See comments for more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7476) task-controller can drop last char from config file
[ https://issues.apache.org/jira/browse/HADOOP-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068073#comment-13068073 ] Allen Wittenauer commented on HADOOP-7476: -- OS X and Solaris. As usual, this typically means removing the GNU-only crud*. At this point, my task-controller uses fgetln() instead of getline(). Since I'm lazy, it is easier to find code we can import that implements fgetln() in a portable fashion than getline(). (Altho if getline() is present w/out fgetln(), I've got a wrapper that implements fgetln() with getline()). * Technically, getline() was added to super-recent POSIX, but none of the platforms that I have access to have that other than glibc-based machines. So it isn't that portable yet. :( task-controller can drop last char from config file --- Key: HADOOP-7476 URL: https://issues.apache.org/jira/browse/HADOOP-7476 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.20.203.0 Reporter: Allen Wittenauer Priority: Trivial It looks as though task-controller's configuration file reader assumes that the output of getline() always ends with \n\0. This assumption does not appear to be safe. See comments for more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7371) Improve tarball distributions
[ https://issues.apache.org/jira/browse/HADOOP-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072972#comment-13072972 ] Allen Wittenauer commented on HADOOP-7371: -- bq. Sources are compressed to a jar file as $HADOOP_PREFIX/share/hadoop/hadoop-source-[version].jar, Javadoc is compressed as $HADOOP_PREFIX/share/javadoc/hadoop-javadoc-[version].jar Do we really want to use jar for these? This could lead to massive confusion. Besides, if these are part of the *tarball* distribution, the user clearly has *tar* available... Improve tarball distributions - Key: HADOOP-7371 URL: https://issues.apache.org/jira/browse/HADOOP-7371 Project: Hadoop Common Issue Type: Improvement Components: build Environment: Java 6, Redhat 5.5 Reporter: Eric Yang Assignee: Eric Yang Fix For: 0.23.0 Attachments: HADOOP-7371.patch Hadoop release tarball contains both raw source and binary. This leads users to use the release tarball as base for applying patches, to build custom Hadoop. This is not the recommended method to develop hadoop because it leads to mixed development system where processed files and raw source are hard to separate. To correct the problematic usage of the release tarball, the release build target should be defined as: ant source generates source release tarball. ant binary is binary release without source/javadoc jar files. ant tar is a mirror of binary release with source/javadoc jar files. Does this sound reasonable? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7371) Improve tarball distributions
[ https://issues.apache.org/jira/browse/HADOOP-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073419#comment-13073419 ] Allen Wittenauer commented on HADOOP-7371: -- Why would Eclipse users use the tarball? Besides, don't Eclipse users have other things they need to do before they can actually do things with Hadoop? Improve tarball distributions - Key: HADOOP-7371 URL: https://issues.apache.org/jira/browse/HADOOP-7371 Project: Hadoop Common Issue Type: Improvement Components: build Environment: Java 6, Redhat 5.5 Reporter: Eric Yang Assignee: Eric Yang Fix For: 0.23.0 Attachments: HADOOP-7371.patch Hadoop release tarball contains both raw source and binary. This leads users to use the release tarball as base for applying patches, to build custom Hadoop. This is not the recommended method to develop hadoop because it leads to mixed development system where processed files and raw source are hard to separate. To correct the problematic usage of the release tarball, the release build target should be defined as: ant source generates source release tarball. ant binary is binary release without source/javadoc jar files. ant tar is a mirror of binary release with source/javadoc jar files. Does this sound reasonable? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7371) Improve tarball distributions
[ https://issues.apache.org/jira/browse/HADOOP-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073662#comment-13073662 ] Allen Wittenauer commented on HADOOP-7371: -- FWIW, I'm not going to block this, but I still think it is going to lead to confusion, except for maybe the three people who debug production grids with eclipse. Improve tarball distributions - Key: HADOOP-7371 URL: https://issues.apache.org/jira/browse/HADOOP-7371 Project: Hadoop Common Issue Type: Improvement Components: build Environment: Java 6, Redhat 5.5 Reporter: Eric Yang Assignee: Eric Yang Fix For: 0.23.0 Attachments: HADOOP-7371.patch Hadoop release tarball contains both raw source and binary. This leads users to use the release tarball as base for applying patches, to build custom Hadoop. This is not the recommended method to develop hadoop because it leads to mixed development system where processed files and raw source are hard to separate. To correct the problematic usage of the release tarball, the release build target should be defined as: ant source generates source release tarball. ant binary is binary release without source/javadoc jar files. ant tar is a mirror of binary release with source/javadoc jar files. Does this sound reasonable? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7494) Add -c option for FSshell -tail
[ https://issues.apache.org/jira/browse/HADOOP-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078315#comment-13078315 ] Allen Wittenauer commented on HADOOP-7494: -- What happens when this is used against non-HDFS or for large values of -c? Add -c option for FSshell -tail --- Key: HADOOP-7494 URL: https://issues.apache.org/jira/browse/HADOOP-7494 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.23.0 Reporter: XieXianshan Assignee: XieXianshan Priority: Trivial Fix For: 0.23.0 Attachments: HADOOP-7494.patch Add the -c option for FSshell -tail to allow users to specify the output bytes(currently,it's -1024 by default). For instance: $ hdfs dfs -tail -c -10 /user/hadoop/xiexs or $ hdfs dfs -tail -c+10 /user/hadoop/xiexs -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7499) Add method for doing a sanity check on hostnames in NetUtils
[ https://issues.apache.org/jira/browse/HADOOP-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078476#comment-13078476 ] Allen Wittenauer commented on HADOOP-7499: -- Does the test actually do a DNS lookup? Add method for doing a sanity check on hostnames in NetUtils Key: HADOOP-7499 URL: https://issues.apache.org/jira/browse/HADOOP-7499 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.23.0 Attachments: HADOOP-7499.patch As part of MAPREDUCE-2489, we need a method in NetUtils to do a sanity check on hostnames -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7506) hadoopcommon build version cant be set from the maven commandline
[ https://issues.apache.org/jira/browse/HADOOP-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079039#comment-13079039 ] Allen Wittenauer commented on HADOOP-7506: -- If we can't change the version # at build time, I don't think we'll be able to upgrade server side-only components without also upgrading all the clients. That's a major hit on the ops side. If that holds true, then we'll need to back out the maven patch before release if we can't fix this. hadoopcommon build version cant be set from the maven commandline - Key: HADOOP-7506 URL: https://issues.apache.org/jira/browse/HADOOP-7506 Project: Hadoop Common Issue Type: Sub-task Components: build Affects Versions: 0.23.0 Reporter: Giridharan Kesavan Assignee: Giridharan Kesavan Attachments: HADOOP-7506.PATCH pom.xml had to introduce hadoop.version property with the default value set to the snapshot version. If someone during build time want to override the version from maven command line they can do so by passing -Dhadoop.version=. For ppl who doesnt want to change the default version can continue building. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7519) hadoop fs commands should support tar/gzip or an equivalent
[ https://issues.apache.org/jira/browse/HADOOP-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080114#comment-13080114 ] Allen Wittenauer commented on HADOOP-7519: -- BTW, I'm fairly certain that distcp works against file:// . hadoop fs commands should support tar/gzip or an equivalent --- Key: HADOOP-7519 URL: https://issues.apache.org/jira/browse/HADOOP-7519 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.20.1 Reporter: Keith Wiley Priority: Minor Labels: hadoop The hadoop fs subcommand should offer options for batching, unbatching, compressing, and uncompressing files on hdfs. The equivalent of hadoop fs -tar or hadoop fs -gzip. These commands would greatly facilitate moving large data (especially in a large number of files) back and forth from hdfs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7521) bintar created tarball should use a common directory for prefix
[ https://issues.apache.org/jira/browse/HADOOP-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080121#comment-13080121 ] Allen Wittenauer commented on HADOOP-7521: -- -1 This completely breaks with customary tar ball behavior. The expectation that when you unpack a tarball is that it will be in (pkgname)-(version). Users are *expecting* to have component separation and in many cases *prefer* component separation. If someone wants a more integrated experience, they'll use the rpm, deb, etc, packaging. bintar created tarball should use a common directory for prefix --- Key: HADOOP-7521 URL: https://issues.apache.org/jira/browse/HADOOP-7521 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, Maven, Linux/Mac Reporter: Eric Yang Assignee: Eric Yang Attachments: HADOOP-7521.patch The binary tarball contains the directory structure like: {noformat} hadoop-common-0.23.0-SNAPSHOT-bin/bin /etc/hadoop /libexec /sbin /share/hadoop/common {noformat} It would be nice to rename the prefix directory to a common directory where it is common to all Hadoop stack software. Therefore, user can untar hbase, hadoop, zookeeper, pig, hive all into the same location and run from the top level directory without manually renaming them to the same directory again. By default the prefix directory can be /usr. Hence, it could merge with the base OS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7521) bintar created tarball should use a common directory for prefix
[ https://issues.apache.org/jira/browse/HADOOP-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080161#comment-13080161 ] Allen Wittenauer commented on HADOOP-7521: -- bq. Allen, the isolated tarball (pkgname-version) is still supported by tar profile. We are discussing merged layout here. If merged layout is: bq. Therefore, user can untar hbase, hadoop, zookeeper, pig, hive all into the same location and run from the top level directory without manually renaming them to the same directory again. then I'm still -1. That is just a flawed idea to try to treat tar as equivalent of rpm. They aren't. bintar created tarball should use a common directory for prefix --- Key: HADOOP-7521 URL: https://issues.apache.org/jira/browse/HADOOP-7521 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, Maven, Linux/Mac Reporter: Eric Yang Assignee: Eric Yang Attachments: HADOOP-7521.patch The binary tarball contains the directory structure like: {noformat} hadoop-common-0.23.0-SNAPSHOT-bin/bin /etc/hadoop /libexec /sbin /share/hadoop/common {noformat} It would be nice to rename the prefix directory to a common directory where it is common to all Hadoop stack software. Therefore, user can untar hbase, hadoop, zookeeper, pig, hive all into the same location and run from the top level directory without manually renaming them to the same directory again. By default the prefix directory can be /usr. Hence, it could merge with the base OS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7521) bintar created tarball should use a common directory for prefix
[ https://issues.apache.org/jira/browse/HADOOP-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080183#comment-13080183 ] Allen Wittenauer commented on HADOOP-7521: -- You mean other than the fact that few to no other tarball on the Internet does this? People who use binary tarballs to deploy things where there is an RPM almost always want package separation and higher levels of control of where things get placed. Changing this paradigm is going to be surprising and counter to those end user goals. In other words: This isn't broke. Stop trying to fix it. bintar created tarball should use a common directory for prefix --- Key: HADOOP-7521 URL: https://issues.apache.org/jira/browse/HADOOP-7521 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, Maven, Linux/Mac Reporter: Eric Yang Assignee: Eric Yang Attachments: HADOOP-7521.patch The binary tarball contains the directory structure like: {noformat} hadoop-common-0.23.0-SNAPSHOT-bin/bin /etc/hadoop /libexec /sbin /share/hadoop/common {noformat} It would be nice to rename the prefix directory to a common directory where it is common to all Hadoop stack software. Therefore, user can untar hbase, hadoop, zookeeper, pig, hive all into the same location and run from the top level directory without manually renaming them to the same directory again. By default the prefix directory can be /usr. Hence, it could merge with the base OS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7521) bintar created tarball should use a common directory for prefix
[ https://issues.apache.org/jira/browse/HADOOP-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080202#comment-13080202 ] Allen Wittenauer commented on HADOOP-7521: -- bq. It is standard practice with popular Ops tools. Yet your examples are dev tools. -1 remains. Might as well close this as won't fix. bintar created tarball should use a common directory for prefix --- Key: HADOOP-7521 URL: https://issues.apache.org/jira/browse/HADOOP-7521 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, Maven, Linux/Mac Reporter: Eric Yang Assignee: Eric Yang Attachments: HADOOP-7521.patch The binary tarball contains the directory structure like: {noformat} hadoop-common-0.23.0-SNAPSHOT-bin/bin /etc/hadoop /libexec /sbin /share/hadoop/common {noformat} It would be nice to rename the prefix directory to a common directory where it is common to all Hadoop stack software. Therefore, user can untar hbase, hadoop, zookeeper, pig, hive all into the same location and run from the top level directory without manually renaming them to the same directory again. By default the prefix directory can be /usr. Hence, it could merge with the base OS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7521) bintar created tarball should use a common directory for prefix
[ https://issues.apache.org/jira/browse/HADOOP-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080251#comment-13080251 ] Allen Wittenauer commented on HADOOP-7521: -- Beyond just the tarbomb problem, you've got file and permission problems. bintar created tarball should use a common directory for prefix --- Key: HADOOP-7521 URL: https://issues.apache.org/jira/browse/HADOOP-7521 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.23.0 Environment: Java 6, Maven, Linux/Mac Reporter: Eric Yang Assignee: Eric Yang Attachments: HADOOP-7521.patch The binary tarball contains the directory structure like: {noformat} hadoop-common-0.23.0-SNAPSHOT-bin/bin /etc/hadoop /libexec /sbin /share/hadoop/common {noformat} It would be nice to rename the prefix directory to a common directory where it is common to all Hadoop stack software. Therefore, user can untar hbase, hadoop, zookeeper, pig, hive all into the same location and run from the top level directory without manually renaming them to the same directory again. By default the prefix directory can be /usr. Hence, it could merge with the base OS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7550) Need for Integrity Validation of RPC
[ https://issues.apache.org/jira/browse/HADOOP-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087431#comment-13087431 ] Allen Wittenauer commented on HADOOP-7550: -- From what I remember, krb5 vs krb5i was like 5-10% perf degradation. krb5p was like another 5%. I'd expect going from nothing to krb5i or krb5p to be fairly horrific. On the plus side, these are already implemented, known quantities, etc. With hardware accelerated crypto now common, the numbers are likely lower for anyone using anything relatively modern on non-Intel gear. For Intel-gear, enabling AES support would probably help. Need for Integrity Validation of RPC Key: HADOOP-7550 URL: https://issues.apache.org/jira/browse/HADOOP-7550 Project: Hadoop Common Issue Type: Improvement Components: ipc Reporter: Dave Thompson Assignee: Dave Thompson Some recent investigation of network packet corruption has shown a need for hadoop RPC integrity validation beyond assurances already provided by 802.3 link layer and TCP 16-bit CRC. During an unusual occurrence on a 4k node cluster, we've seen as high as 4 TCP anomalies per second on a single node, sustained over an hour (14k per hour). A TCP anomaly would be an escaped link layer packet that resulted in a TCP CRC failure, TCP packet out of sequence or TCP packet size error. According to this paper[*]: http://tinyurl.com/3aue72r TCP's 16-bit CRC has an effective detection rate of 2^10. 1 in 1024 errors may escape detection, and in fact what originally alerted us to this issue was seeing failures due to bit-errors in hadoop traffic. Extrapolating from that paper, one might expect 14 escaped packet errors per hour for that single node of a 4k cluster. While the above error rate was unusually high due to a broadband aggregate switch issue, hadoop not having an integrity check on RPC makes it problematic to discover, and limit any potential data damage due to acting on a corrupt RPC message. -- [*] In case this jira outlives that tinyurl, the IEEE paper cited is: Performance of Checksums and CRCs over Real Data by Jonathan Stone, Michael Greenwald, Craig Partridge, Jim Hughes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7596) Enable jsvc to work with Hadoop RPM package
[ https://issues.apache.org/jira/browse/HADOOP-7596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095440#comment-13095440 ] Allen Wittenauer commented on HADOOP-7596: -- bq. Hadoop only works with Sun Java. This isn't true and one of the reasons why attempting to figure out which java to use programmatically is full of pot holes. Enable jsvc to work with Hadoop RPM package --- Key: HADOOP-7596 URL: https://issues.apache.org/jira/browse/HADOOP-7596 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.20.204.0 Environment: Java 6, RedHat EL 5.6 Reporter: Eric Yang Assignee: Eric Yang Fix For: 0.20.205.0 Attachments: HADOOP-7596.patch For secure Hadoop 0.20.2xx cluster, datanode can only run with 32 bit jvm because Hadoop only packages 32 bit jsvc. The build process should download proper jsvc versions base on the build architecture. In addition, the shell script should be enhanced to locate hadoop jar files in the proper location. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7596) Enable jsvc to work with Hadoop RPM package
[ https://issues.apache.org/jira/browse/HADOOP-7596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095500#comment-13095500 ] Allen Wittenauer commented on HADOOP-7596: -- http://wiki.apache.org/hadoop/HadoopJavaVersions Enable jsvc to work with Hadoop RPM package --- Key: HADOOP-7596 URL: https://issues.apache.org/jira/browse/HADOOP-7596 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.20.204.0 Environment: Java 6, RedHat EL 5.6 Reporter: Eric Yang Assignee: Eric Yang Fix For: 0.20.205.0 Attachments: HADOOP-7596.patch For secure Hadoop 0.20.2xx cluster, datanode can only run with 32 bit jvm because Hadoop only packages 32 bit jsvc. The build process should download proper jsvc versions base on the build architecture. In addition, the shell script should be enhanced to locate hadoop jar files in the proper location. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7603) Set default hdfs, mapred uid, and hadoop group gid for RPM packages
[ https://issues.apache.org/jira/browse/HADOOP-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096074#comment-13096074 ] Allen Wittenauer commented on HADOOP-7603: -- bq. What group uses 49? wnn uses it: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s1-users-groups-standard-users.html Set default hdfs, mapred uid, and hadoop group gid for RPM packages --- Key: HADOOP-7603 URL: https://issues.apache.org/jira/browse/HADOOP-7603 Project: Hadoop Common Issue Type: Bug Environment: Java, Redhat EL, Ubuntu Reporter: Eric Yang Assignee: Eric Yang Hadoop rpm package creates hdfs, mapped users, and hadoop group for automatically setting up pid directory and log directory with proper permission. The default headless users should have a fixed uid, and gid numbers defined. Searched through the standard uid and gid on both Redhat and Debian distro. It looks like: {noformat} uid: 201 for hdfs uid: 202 for mapred gid: 49 for hadoop {noformat} would be free for use. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7624) Set things up for a top level hadoop-tools module
[ https://issues.apache.org/jira/browse/HADOOP-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102835#comment-13102835 ] Allen Wittenauer commented on HADOOP-7624: -- We need a rule set of what goes into tools so that we don't create contrib v2. Until that, I'm very much -1 on this. Set things up for a top level hadoop-tools module - Key: HADOOP-7624 URL: https://issues.apache.org/jira/browse/HADOOP-7624 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Vinod Kumar Vavilapalli See this thread: http://markmail.org/thread/cxtz3i6lvztfgfxn We need to get things up and running for a top level hadoop-tools module. DistCpV2 will be the first resident of this new home. Things we need: - The module itself and a top level pom with appropriate dependencies - Integration with the patch builds for the new module - Integration with the post-commit and nightly builds for the new module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7624) Set things up for a top level hadoop-tools module
[ https://issues.apache.org/jira/browse/HADOOP-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102921#comment-13102921 ] Allen Wittenauer commented on HADOOP-7624: -- It is not acceptable to say we're going to create this anyway and deal with the consequences later. Set things up for a top level hadoop-tools module - Key: HADOOP-7624 URL: https://issues.apache.org/jira/browse/HADOOP-7624 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Vinod Kumar Vavilapalli See this thread: http://markmail.org/thread/cxtz3i6lvztfgfxn We need to get things up and running for a top level hadoop-tools module. DistCpV2 will be the first resident of this new home. Things we need: - The module itself and a top level pom with appropriate dependencies - Integration with the patch builds for the new module - Integration with the post-commit and nightly builds for the new module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7624) Set things up for a top level hadoop-tools module
[ https://issues.apache.org/jira/browse/HADOOP-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102963#comment-13102963 ] Allen Wittenauer commented on HADOOP-7624: -- This JIRA was set up under the pretense of creating the hadoop-tools space. Even the summary statement says: Set things up for a top level hadoop-tools module. It seems logical to me that this is the space where this discussion needs to happen. Set things up for a top level hadoop-tools module - Key: HADOOP-7624 URL: https://issues.apache.org/jira/browse/HADOOP-7624 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Vinod Kumar Vavilapalli See this thread: http://markmail.org/thread/cxtz3i6lvztfgfxn We need to get things up and running for a top level hadoop-tools module. DistCpV2 will be the first resident of this new home. Things we need: - The module itself and a top level pom with appropriate dependencies - Integration with the patch builds for the new module - Integration with the post-commit and nightly builds for the new module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7624) Set things up for a top level hadoop-tools module
[ https://issues.apache.org/jira/browse/HADOOP-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103080#comment-13103080 ] Allen Wittenauer commented on HADOOP-7624: -- No, there was not consensus. I even said in the mailing list that I would oppose this without some rules to prevent this turning into contrib v2.0. I honestly think that the only way to prevent this from turning into a complete mess is to essentially make it a full-fledged sub-project. Set things up for a top level hadoop-tools module - Key: HADOOP-7624 URL: https://issues.apache.org/jira/browse/HADOOP-7624 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Vinod Kumar Vavilapalli See this thread: http://markmail.org/thread/cxtz3i6lvztfgfxn We need to get things up and running for a top level hadoop-tools module. DistCpV2 will be the first resident of this new home. Things we need: - The module itself and a top level pom with appropriate dependencies - Integration with the patch builds for the new module - Integration with the post-commit and nightly builds for the new module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7624) Set things up for a top level hadoop-tools module
[ https://issues.apache.org/jira/browse/HADOOP-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103834#comment-13103834 ] Allen Wittenauer commented on HADOOP-7624: -- I see a handful of choices: a) rename contrib to tools and quit lying to ourselves that putting random stuff in a different directory makes them special b) integrate these components directly into the mapreduce jar c) make a new Hadoop sub project to hold these random things d) just keep these components in contrib e) make tools separate from contrib, but actually put some rules and process around what goes in there so that we don't end up with the same mess we had before I basically don't want to see a repeat of history. If we don't do this now, in a year or three we're going to be back to we need to prune contrib^H^H^H^H^H^H^Htools of all this abandoned source. If these things are important, just integrate them directly into the mainline jars and be done with it. Set things up for a top level hadoop-tools module - Key: HADOOP-7624 URL: https://issues.apache.org/jira/browse/HADOOP-7624 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Vinod Kumar Vavilapalli Assignee: Alejandro Abdelnur Attachments: HADOOP-7624.patch See this thread: http://markmail.org/thread/cxtz3i6lvztfgfxn We need to get things up and running for a top level hadoop-tools module. DistCpV2 will be the first resident of this new home. Things we need: - The module itself and a top level pom with appropriate dependencies - Integration with the patch builds for the new module - Integration with the post-commit and nightly builds for the new module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7624) Set things up for a top level hadoop-tools module
[ https://issues.apache.org/jira/browse/HADOOP-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107297#comment-13107297 ] Allen Wittenauer commented on HADOOP-7624: -- I'm removing my -1. Commit away. Set things up for a top level hadoop-tools module - Key: HADOOP-7624 URL: https://issues.apache.org/jira/browse/HADOOP-7624 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Vinod Kumar Vavilapalli Assignee: Alejandro Abdelnur Attachments: HADOOP-7624.patch See this thread: http://markmail.org/thread/cxtz3i6lvztfgfxn We need to get things up and running for a top level hadoop-tools module. DistCpV2 will be the first resident of this new home. Things we need: - The module itself and a top level pom with appropriate dependencies - Integration with the patch builds for the new module - Integration with the post-commit and nightly builds for the new module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7228) jar names are not compatible with 0.20.2
[ https://issues.apache.org/jira/browse/HADOOP-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-7228. -- Resolution: Won't Fix Surprise. jar names are not compatible with 0.20.2 Key: HADOOP-7228 URL: https://issues.apache.org/jira/browse/HADOOP-7228 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 0.20.203.0 Reporter: Allen Wittenauer Priority: Critical The jars in 203 are named differently vs. Apache Hadoop 0.20.2. I understand this was done to make the Maven people less cranky. However, this breaks compatibility especially for streaming users. We need to make sure we have a release note or something significant so that users aren't taken by surprise. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-7228) jar names are not compatible with 0.20.2
[ https://issues.apache.org/jira/browse/HADOOP-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112895#comment-13112895 ] Allen Wittenauer commented on HADOOP-7228: -- Pretty much too late. branch-20-security is all about breaking compatibility with 0.20.[0-2] it seems. jar names are not compatible with 0.20.2 Key: HADOOP-7228 URL: https://issues.apache.org/jira/browse/HADOOP-7228 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 0.20.203.0 Reporter: Allen Wittenauer Priority: Critical The jars in 203 are named differently vs. Apache Hadoop 0.20.2. I understand this was done to make the Maven people less cranky. However, this breaks compatibility especially for streaming users. We need to make sure we have a release note or something significant so that users aren't taken by surprise. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira