[jira] [Commented] (HADOOP-10273) Fix 'mvn site'
[ https://issues.apache.org/jira/browse/HADOOP-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880808#comment-13880808 ] Akira AJISAKA commented on HADOOP-10273: I reproduced this issue with Maven 3.1.1. +1, I verified the patch fixes the build break both on trunk and branch-2. Fix 'mvn site' -- Key: HADOOP-10273 URL: https://issues.apache.org/jira/browse/HADOOP-10273 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 3.0.0, 2.2.0 Reporter: Arpit Agarwal Attachments: HADOOP-10273.patch 'mvn site' fails with {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.0:site (default-site) on project hadoop-main: Execution default-site of goal org.apache.maven.plugins:maven-site-plugin:3.0:site failed: A required class was missing while executing org.apache.maven.plugins:maven-site-plugin:3.0:site: org/sonatype/aether/graph/DependencyFilter [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound {code} Looks related to https://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound Bumping the maven-site-plugin version should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10273) Fix 'mvn site'
[ https://issues.apache.org/jira/browse/HADOOP-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HADOOP-10273: --- Assignee: Arpit Agarwal Target Version/s: 2.4.0 Affects Version/s: 2.2.0 Hadoop Flags: Reviewed Status: Patch Available (was: Open) Starting Jenkins. Fix 'mvn site' -- Key: HADOOP-10273 URL: https://issues.apache.org/jira/browse/HADOOP-10273 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 2.2.0, 3.0.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HADOOP-10273.patch 'mvn site' fails with {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.0:site (default-site) on project hadoop-main: Execution default-site of goal org.apache.maven.plugins:maven-site-plugin:3.0:site failed: A required class was missing while executing org.apache.maven.plugins:maven-site-plugin:3.0:site: org/sonatype/aether/graph/DependencyFilter [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound {code} Looks related to https://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound Bumping the maven-site-plugin version should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10273) Fix 'mvn site'
[ https://issues.apache.org/jira/browse/HADOOP-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HADOOP-10273: --- Environment: Maven 3.1.x Fix 'mvn site' -- Key: HADOOP-10273 URL: https://issues.apache.org/jira/browse/HADOOP-10273 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 3.0.0, 2.2.0 Environment: Maven 3.1.x Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HADOOP-10273.patch 'mvn site' fails with {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.0:site (default-site) on project hadoop-main: Execution default-site of goal org.apache.maven.plugins:maven-site-plugin:3.0:site failed: A required class was missing while executing org.apache.maven.plugins:maven-site-plugin:3.0:site: org/sonatype/aether/graph/DependencyFilter [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound {code} Looks related to https://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound Bumping the maven-site-plugin version should fix it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10275) Serialization should remove its type parameter
Hiroshi Ikeda created HADOOP-10275: -- Summary: Serialization should remove its type parameter Key: HADOOP-10275 URL: https://issues.apache.org/jira/browse/HADOOP-10275 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.2.0 Reporter: Hiroshi Ikeda Priority: Minor org.apache.hadoop.io.serializer.Serialization is defined as: {code} public interface SerializationT { ... SerializerT getSerializer(ClassT c); DeserializerT getDeserializer(ClassT c); } {code} but the type parameter T is semantically invalid, and type mismatchings in the code are suppressed by explicit cast and annotations. This interface should be defined as follows: {code} public interface Serialization { ... T SerializerT getSerializer(ClassT c); T DeserializerT getDeserializer(ClassT c); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10275) Serialization should remove its type parameter
[ https://issues.apache.org/jira/browse/HADOOP-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Ikeda updated HADOOP-10275: --- Attachment: HADOOP-10275.patch Added a patch for 2.2.0 Serialization should remove its type parameter -- Key: HADOOP-10275 URL: https://issues.apache.org/jira/browse/HADOOP-10275 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.2.0 Reporter: Hiroshi Ikeda Priority: Minor Attachments: HADOOP-10275.patch org.apache.hadoop.io.serializer.Serialization is defined as: {code} public interface SerializationT { ... SerializerT getSerializer(ClassT c); DeserializerT getDeserializer(ClassT c); } {code} but the type parameter T is semantically invalid, and type mismatchings in the code are suppressed by explicit cast and annotations. This interface should be defined as follows: {code} public interface Serialization { ... T SerializerT getSerializer(ClassT c); T DeserializerT getDeserializer(ClassT c); } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10276) CLONE - RawLocalFs#getFileLinkStatus does not fill in the link owner and mode
Jason Lowe created HADOOP-10276: --- Summary: CLONE - RawLocalFs#getFileLinkStatus does not fill in the link owner and mode Key: HADOOP-10276 URL: https://issues.apache.org/jira/browse/HADOOP-10276 Project: Hadoop Common Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 {{RawLocalFs#getFileLinkStatus}} does not actually get the owner and mode of the symlink, but instead uses the owner and mode of the symlink target. If the target can't be found, it fills in bogus values (the empty string and FsPermission.getDefault) for these. Symlinks have an owner distinct from the owner of the target they point to, and getFileLinkStatus ought to expose this. In some operating systems, symlinks can have a permission other than 0777. We ought to expose this in RawLocalFilesystem and other places, although we don't necessarily have to support this behavior in HDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10276) RawLocalFs#getFileLinkStatus does not fill in the link owner and mode
[ https://issues.apache.org/jira/browse/HADOOP-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-10276: Affects Version/s: 2.2.0 Fix Version/s: (was: 2.4.0) Assignee: (was: Colin Patrick McCabe) Summary: RawLocalFs#getFileLinkStatus does not fill in the link owner and mode (was: CLONE - RawLocalFs#getFileLinkStatus does not fill in the link owner and mode) Cloned from HADOOP-9652, as that JIRA lays the groundwork but does not completely fix the originally reported issue. RawLocalFs#getFileLinkStatus does not fill in the link owner and mode - Key: HADOOP-10276 URL: https://issues.apache.org/jira/browse/HADOOP-10276 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe {{RawLocalFs#getFileLinkStatus}} does not actually get the owner and mode of the symlink, but instead uses the owner and mode of the symlink target. If the target can't be found, it fills in bogus values (the empty string and FsPermission.getDefault) for these. Symlinks have an owner distinct from the owner of the target they point to, and getFileLinkStatus ought to expose this. In some operating systems, symlinks can have a permission other than 0777. We ought to expose this in RawLocalFilesystem and other places, although we don't necessarily have to support this behavior in HDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10276) RawLocalFs#getFileLinkStatus does not fill in the link owner and mode by default
[ https://issues.apache.org/jira/browse/HADOOP-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-10276: Summary: RawLocalFs#getFileLinkStatus does not fill in the link owner and mode by default (was: RawLocalFs#getFileLinkStatus does not fill in the link owner and mode) RawLocalFs#getFileLinkStatus does not fill in the link owner and mode by default Key: HADOOP-10276 URL: https://issues.apache.org/jira/browse/HADOOP-10276 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe {{RawLocalFs#getFileLinkStatus}} does not actually get the owner and mode of the symlink, but instead uses the owner and mode of the symlink target. If the target can't be found, it fills in bogus values (the empty string and FsPermission.getDefault) for these. Symlinks have an owner distinct from the owner of the target they point to, and getFileLinkStatus ought to expose this. In some operating systems, symlinks can have a permission other than 0777. We ought to expose this in RawLocalFilesystem and other places, although we don't necessarily have to support this behavior in HDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-9652) Allow RawLocalFs#getFileLinkStatus to fill in the link owner and mode if requested
[ https://issues.apache.org/jira/browse/HADOOP-9652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-9652: --- Issue Type: Improvement (was: Bug) Summary: Allow RawLocalFs#getFileLinkStatus to fill in the link owner and mode if requested (was: RawLocalFs#getFileLinkStatus does not fill in the link owner and mode) Hadoop Flags: Reviewed Updating summary to reflect this is adding the ability to get a link's owner and mode but only if requested. Filed HADOOP-10276 to track the fix for the original reported issue. Allow RawLocalFs#getFileLinkStatus to fill in the link owner and mode if requested -- Key: HADOOP-9652 URL: https://issues.apache.org/jira/browse/HADOOP-9652 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: 0001-temporarily-disable-HADOOP-9652.patch, hadoop-9452-1.patch, hadoop-9652-2.patch, hadoop-9652-3.patch, hadoop-9652-4.patch, hadoop-9652-5.patch, hadoop-9652-6.patch, hadoop-9652-workaround.patch {{RawLocalFs#getFileLinkStatus}} does not actually get the owner and mode of the symlink, but instead uses the owner and mode of the symlink target. If the target can't be found, it fills in bogus values (the empty string and FsPermission.getDefault) for these. Symlinks have an owner distinct from the owner of the target they point to, and getFileLinkStatus ought to expose this. In some operating systems, symlinks can have a permission other than 0777. We ought to expose this in RawLocalFilesystem and other places, although we don't necessarily have to support this behavior in HDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10245) Hadoop command line always appends -Xmx option twice
[ https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881057#comment-13881057 ] Wei Yan commented on HADOOP-10245: -- [~shanyu], sorry for the late reply. So you mean we only let users specify -Xmx through $JAVA_HEAP_MAX? Hadoop command line always appends -Xmx option twice -- Key: HADOOP-10245 URL: https://issues.apache.org/jira/browse/HADOOP-10245 Project: Hadoop Common Issue Type: Bug Components: bin Affects Versions: 2.2.0 Reporter: shanyu zhao Assignee: shanyu zhao Attachments: HADOOP-10245.patch The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with -Xmx options twice. The impact is that any user defined HADOOP_HEAP_SIZE env variable will take no effect because it is overwritten by the second -Xmx option. For example, here is the java cmd generated for command hadoop fs -ls /, Notice that there are two -Xmx options: -Xmx1000m and -Xmx512m in the command line: java -Xmx1000m -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log -Dhadoop.root.logger=INFO,c onsole,DRFA -Xmx512m -Dhadoop.security.logger=INFO,RFAS -classpath XXX org.apache.hadoop.fs.FsShell -ls / Here is the root cause: The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls hadoop-env.sh. In hadoop.sh, the command line is generated by the following pseudo code: java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ... In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as -Xmx1000m if user didn't set $HADOOP_HEAP_SIZE env variable. In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this: export HADOOP_CLIENT_OPTS=-Xmx512m $HADOOP_CLIENT_OPTS To fix this problem, we should remove the -Xmx512m from HADOOP_CLIENT_OPTS. If we really want to change the memory settings we need to use $HADOOP_HEAP_SIZE env variable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-9652) Allow RawLocalFs#getFileLinkStatus to fill in the link owner and mode if requested
[ https://issues.apache.org/jira/browse/HADOOP-9652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-9652: --- Resolution: Fixed Assignee: Andrew Wang (was: Colin Patrick McCabe) Status: Resolved (was: Patch Available) Thanks Andrew! I committed the addendum patch to trunk and branch-2. Allow RawLocalFs#getFileLinkStatus to fill in the link owner and mode if requested -- Key: HADOOP-9652 URL: https://issues.apache.org/jira/browse/HADOOP-9652 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Assignee: Andrew Wang Fix For: 2.4.0 Attachments: 0001-temporarily-disable-HADOOP-9652.patch, hadoop-9452-1.patch, hadoop-9652-2.patch, hadoop-9652-3.patch, hadoop-9652-4.patch, hadoop-9652-5.patch, hadoop-9652-6.patch, hadoop-9652-workaround.patch {{RawLocalFs#getFileLinkStatus}} does not actually get the owner and mode of the symlink, but instead uses the owner and mode of the symlink target. If the target can't be found, it fills in bogus values (the empty string and FsPermission.getDefault) for these. Symlinks have an owner distinct from the owner of the target they point to, and getFileLinkStatus ought to expose this. In some operating systems, symlinks can have a permission other than 0777. We ought to expose this in RawLocalFilesystem and other places, although we don't necessarily have to support this behavior in HDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HADOOP-10272) Hadoop 2 -copyFromLocal fail when source is a folder and there are spaces in the path
[ https://issues.apache.org/jira/browse/HADOOP-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu reassigned HADOOP-10272: -- Assignee: Chuan Liu Hadoop 2 -copyFromLocal fail when source is a folder and there are spaces in the path --- Key: HADOOP-10272 URL: https://issues.apache.org/jira/browse/HADOOP-10272 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.2.0 Reporter: Shuaishuai Nie Assignee: Chuan Liu Repro steps: with folder structure like: /ab/c d/ef.txt hadoop command (hadoop fs -copyFromLocal /ab/ /) or (hadoop fs -copyFromLocal /ab/c d/ /) fail with error: copyFromLocal: File file:/ab/c%20d/ef.txt does not exist However command (hadoop fs -copyFromLocal /ab/c d/ef.txt /) success. Seems like hadoop treat file and directory differently when copyFromLocal. This only happens in Hadoop 2 and causing 2 Hive unit test failures (external_table_with_space_in_location_path.q and load_hdfs_file_with_space_in_the_name.q). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10272) Hadoop 2 -copyFromLocal fail when source is a folder and there are spaces in the path
[ https://issues.apache.org/jira/browse/HADOOP-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881351#comment-13881351 ] Chuan Liu commented on HADOOP-10272: I can take a look at the root cause. Hadoop 2 -copyFromLocal fail when source is a folder and there are spaces in the path --- Key: HADOOP-10272 URL: https://issues.apache.org/jira/browse/HADOOP-10272 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.2.0 Reporter: Shuaishuai Nie Assignee: Chuan Liu Repro steps: with folder structure like: /ab/c d/ef.txt hadoop command (hadoop fs -copyFromLocal /ab/ /) or (hadoop fs -copyFromLocal /ab/c d/ /) fail with error: copyFromLocal: File file:/ab/c%20d/ef.txt does not exist However command (hadoop fs -copyFromLocal /ab/c d/ef.txt /) success. Seems like hadoop treat file and directory differently when copyFromLocal. This only happens in Hadoop 2 and causing 2 Hive unit test failures (external_table_with_space_in_location_path.q and load_hdfs_file_with_space_in_the_name.q). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10158) SPNEGO should work with multiple interfaces/SPNs.
[ https://issues.apache.org/jira/browse/HADOOP-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HADOOP-10158: -- Attachment: HADOOP-10158_multiplerealms.patch Attaching a patch with the following changes for consideration: 1. Add _getServerPrincipals_ method in _SecurityUtil_ which handles multiple principals 2. Invoked it from _NameNodeHttpServer_ (for webhdfs) and _AuthenticationFilterInitializer_ (for external urls) in addition to _HttpServer_ (for internal urls) SPNEGO should work with multiple interfaces/SPNs. - Key: HADOOP-10158 URL: https://issues.apache.org/jira/browse/HADOOP-10158 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.2.0 Reporter: Kihwal Lee Assignee: Daryn Sharp Attachments: HADOOP-10158.patch, HADOOP-10158_multiplerealms.patch, HADOOP-10158_multiplerealms.patch, HADOOP-10158_multiplerealms.patch This is the list of internal servlets added by namenode. | Name | Auth | Need to be accessible by end users | | StartupProgressServlet | none | no | | GetDelegationTokenServlet | internal SPNEGO | yes | | RenewDelegationTokenServlet | internal SPNEGO | yes | | CancelDelegationTokenServlet | internal SPNEGO | yes | | FsckServlet | internal SPNEGO | yes | | GetImageServlet | internal SPNEGO | no | | ListPathsServlet | token in query | yes | | FileDataServlet | token in query | yes | | FileChecksumServlets | token in query | yes | | ContentSummaryServlet | token in query | yes | GetDelegationTokenServlet, RenewDelegationTokenServlet, CancelDelegationTokenServlet and FsckServlet are accessed by end users, but hard-coded to use the internal SPNEGO filter. If a name node HTTP server binds to multiple external IP addresses, the internal SPNEGO service principal name may not work with an address to which end users are connecting. The current SPNEGO implementation in Hadoop is limited to use a single service principal per filter. If the underlying hadoop kerberos authentication handler cannot easily be modified, we can at least create a separate auth filter for the end-user facing servlets so that their service principals can be independently configured. If not defined, it should fall back to the current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10158) SPNEGO should work with multiple interfaces/SPNs.
[ https://issues.apache.org/jira/browse/HADOOP-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HADOOP-10158: Priority: Critical (was: Major) SPNEGO should work with multiple interfaces/SPNs. - Key: HADOOP-10158 URL: https://issues.apache.org/jira/browse/HADOOP-10158 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.2.0 Reporter: Kihwal Lee Assignee: Daryn Sharp Priority: Critical Attachments: HADOOP-10158.patch, HADOOP-10158_multiplerealms.patch, HADOOP-10158_multiplerealms.patch, HADOOP-10158_multiplerealms.patch This is the list of internal servlets added by namenode. | Name | Auth | Need to be accessible by end users | | StartupProgressServlet | none | no | | GetDelegationTokenServlet | internal SPNEGO | yes | | RenewDelegationTokenServlet | internal SPNEGO | yes | | CancelDelegationTokenServlet | internal SPNEGO | yes | | FsckServlet | internal SPNEGO | yes | | GetImageServlet | internal SPNEGO | no | | ListPathsServlet | token in query | yes | | FileDataServlet | token in query | yes | | FileChecksumServlets | token in query | yes | | ContentSummaryServlet | token in query | yes | GetDelegationTokenServlet, RenewDelegationTokenServlet, CancelDelegationTokenServlet and FsckServlet are accessed by end users, but hard-coded to use the internal SPNEGO filter. If a name node HTTP server binds to multiple external IP addresses, the internal SPNEGO service principal name may not work with an address to which end users are connecting. The current SPNEGO implementation in Hadoop is limited to use a single service principal per filter. If the underlying hadoop kerberos authentication handler cannot easily be modified, we can at least create a separate auth filter for the end-user facing servlets so that their service principals can be independently configured. If not defined, it should fall back to the current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-9640) RPC Congestion Control with FairCallQueue
[ https://issues.apache.org/jira/browse/HADOOP-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881428#comment-13881428 ] Kihwal Lee commented on HADOOP-9640: Can low priority requests starve higher priority requests? If a low priority call queue is full and all reader threads are blocked on put() for adding calls belonging to that queue, newly arriving higher priority requests won't get processed even if their corresponding queue is not full. If the request rate stays greater than service rate for some time in this state, the listen queue will likely overflow and all types of requests will suffer regardless of priority. RPC Congestion Control with FairCallQueue - Key: HADOOP-9640 URL: https://issues.apache.org/jira/browse/HADOOP-9640 Project: Hadoop Common Issue Type: Improvement Affects Versions: 3.0.0, 2.2.0 Reporter: Xiaobo Peng Labels: hdfs, qos, rpc Attachments: MinorityMajorityPerformance.pdf, NN-denial-of-service-updated-plan.pdf, faircallqueue.patch, faircallqueue2.patch, faircallqueue3.patch, faircallqueue4.patch, faircallqueue5.patch, faircallqueue6.patch, faircallqueue7_with_runtime_swapping.patch, rpc-congestion-control-draft-plan.pdf Several production Hadoop cluster incidents occurred where the Namenode was overloaded and failed to respond. We can improve quality of service for users during namenode peak loads by replacing the FIFO call queue with a [Fair Call Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf]. (this plan supersedes rpc-congestion-control-draft-plan). Excerpted from the communication of one incident, “The map task of a user was creating huge number of small files in the user directory. Due to the heavy load on NN, the JT also was unable to communicate with NN...The cluster became responsive only once the job was killed.” Excerpted from the communication of another incident, “Namenode was overloaded by GetBlockLocation requests (Correction: should be getFileInfo requests. the job had a bug that called getFileInfo for a nonexistent file in an endless loop). All other requests to namenode were also affected by this and hence all jobs slowed down. Cluster almost came to a grinding halt…Eventually killed jobtracker to kill all jobs that are running.” Excerpted from HDFS-945, “We've seen defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc.” -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10255) Copy the HttpServer in 2.2 back to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881445#comment-13881445 ] stack commented on HADOOP-10255: [~wheat9] Any luck w/ a branch-2 patch? Any ETA? Would be good all around getting this 2.4 blocker behind us. Thanks. Copy the HttpServer in 2.2 back to branch-2 --- Key: HADOOP-10255 URL: https://issues.apache.org/jira/browse/HADOOP-10255 Project: Hadoop Common Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Blocker Fix For: 2.4.0 Attachments: HADOOP-10255.000.patch, HADOOP-10255.001.patch, HADOOP-10255.002.patch, HADOOP-10255.003.patch As suggested in HADOOP-10253, HBase needs a temporary copy of {{HttpServer}} from branch-2.2 to make sure it works across multiple 2.x releases. This patch renames the current {{HttpServer}} into {{HttpServer2}}, and bring the {{HttpServer}} in branch-2.2 into the repository. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10277) setfacl -x fails to parse ACL spec if trying to remove the mask entry.
Chris Nauroth created HADOOP-10277: -- Summary: setfacl -x fails to parse ACL spec if trying to remove the mask entry. Key: HADOOP-10277 URL: https://issues.apache.org/jira/browse/HADOOP-10277 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth You should be able to use setfacl -x to remove the mask entry (which then triggers recalculation of an automatically inferred mask if the file has an extended ACL). Right now, this causes a failure to parse the ACL spec due to a bug in {{AclEntry#parseAclSpec}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10255) Copy the HttpServer in 2.2 back to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881449#comment-13881449 ] Haohui Mai commented on HADOOP-10255: - Still waiting for Jenkins to come up. My plan is to apply this patch to branch-2, then directly pull in the HttpServer in branch-2.2 into branch-2. How does it sound? Copy the HttpServer in 2.2 back to branch-2 --- Key: HADOOP-10255 URL: https://issues.apache.org/jira/browse/HADOOP-10255 Project: Hadoop Common Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Blocker Fix For: 2.4.0 Attachments: HADOOP-10255.000.patch, HADOOP-10255.001.patch, HADOOP-10255.002.patch, HADOOP-10255.003.patch As suggested in HADOOP-10253, HBase needs a temporary copy of {{HttpServer}} from branch-2.2 to make sure it works across multiple 2.x releases. This patch renames the current {{HttpServer}} into {{HttpServer2}}, and bring the {{HttpServer}} in branch-2.2 into the repository. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10277) setfacl -x fails to parse ACL spec if trying to remove the mask entry.
[ https://issues.apache.org/jira/browse/HADOOP-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881462#comment-13881462 ] Chris Nauroth commented on HADOOP-10277: Thank you to [~sachinjose2...@gmail.com] and [~renil.joseph] for reporting this bug. setfacl -x fails to parse ACL spec if trying to remove the mask entry. -- Key: HADOOP-10277 URL: https://issues.apache.org/jira/browse/HADOOP-10277 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Chris Nauroth You should be able to use setfacl -x to remove the mask entry (which then triggers recalculation of an automatically inferred mask if the file has an extended ACL). Right now, this causes a failure to parse the ACL spec due to a bug in {{AclEntry#parseAclSpec}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-9640) RPC Congestion Control with FairCallQueue
[ https://issues.apache.org/jira/browse/HADOOP-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881465#comment-13881465 ] Chris Li commented on HADOOP-9640: -- [~kihwal] In the first 6 versions of this patch, this does indeed happen. It's partially alleviated due to the round-robin withdrawal from the queues. In the latest iteration of the patch (7), the reader threads would lock on the queue's putLock like they do in trunk. I think this behavior is more intuitive. Today I will be breaking this JIRA to make it easier to review. RPC Congestion Control with FairCallQueue - Key: HADOOP-9640 URL: https://issues.apache.org/jira/browse/HADOOP-9640 Project: Hadoop Common Issue Type: Improvement Affects Versions: 3.0.0, 2.2.0 Reporter: Xiaobo Peng Labels: hdfs, qos, rpc Attachments: MinorityMajorityPerformance.pdf, NN-denial-of-service-updated-plan.pdf, faircallqueue.patch, faircallqueue2.patch, faircallqueue3.patch, faircallqueue4.patch, faircallqueue5.patch, faircallqueue6.patch, faircallqueue7_with_runtime_swapping.patch, rpc-congestion-control-draft-plan.pdf Several production Hadoop cluster incidents occurred where the Namenode was overloaded and failed to respond. We can improve quality of service for users during namenode peak loads by replacing the FIFO call queue with a [Fair Call Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf]. (this plan supersedes rpc-congestion-control-draft-plan). Excerpted from the communication of one incident, “The map task of a user was creating huge number of small files in the user directory. Due to the heavy load on NN, the JT also was unable to communicate with NN...The cluster became responsive only once the job was killed.” Excerpted from the communication of another incident, “Namenode was overloaded by GetBlockLocation requests (Correction: should be getFileInfo requests. the job had a bug that called getFileInfo for a nonexistent file in an endless loop). All other requests to namenode were also affected by this and hence all jobs slowed down. Cluster almost came to a grinding halt…Eventually killed jobtracker to kill all jobs that are running.” Excerpted from HDFS-945, “We've seen defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc.” -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10278) Refactor to make CallQueue pluggable
Chris Li created HADOOP-10278: - Summary: Refactor to make CallQueue pluggable Key: HADOOP-10278 URL: https://issues.apache.org/jira/browse/HADOOP-10278 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Chris Li * Refactor CallQueue into an interface, base, and default implementation that matches today's behavior * Make the call queue impl configurable, keyed on port so that we minimize coupling -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10246) define FS permissions model with tests
[ https://issues.apache.org/jira/browse/HADOOP-10246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881474#comment-13881474 ] Colin Patrick McCabe commented on HADOOP-10246: --- I agree that we should have more tests for this. However, the current behavior seems correct to me. It's modelled closely on the traditional POSIX behavior, where {{mkdir}} honors {{umask}}, but {{chmod}} does not. Anything else would be surprising for users coming from traditional filesystems. Another reason for the current behavior is that if {{chmod}} consulted {{umask}}, there would be no way for users to set less restrictive permissions than specified in {{umask}}. This is contrary to the purpose of {{umask}}, which is just to be a helpful default, not a hard constraint. define FS permissions model with tests -- Key: HADOOP-10246 URL: https://issues.apache.org/jira/browse/HADOOP-10246 Project: Hadoop Common Issue Type: Sub-task Components: fs Reporter: Steve Loughran Priority: Minor It's interesting that HDFS mkdirs(dir, permission) uses the umask, but setPermissions() does not The permissions model, including umask logic should be defined and have tests implemented by those filesystems that support permissions-based security -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10278) Refactor to make CallQueue pluggable
[ https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10278: -- Status: Patch Available (was: Open) Refactor to make CallQueue pluggable Key: HADOOP-10278 URL: https://issues.apache.org/jira/browse/HADOOP-10278 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Chris Li Attachments: subtask_refactor_callqueue.patch * Refactor CallQueue into an interface, base, and default implementation that matches today's behavior * Make the call queue impl configurable, keyed on port so that we minimize coupling -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10246) define FS permissions model with tests
[ https://issues.apache.org/jira/browse/HADOOP-10246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881476#comment-13881476 ] Colin Patrick McCabe commented on HADOOP-10246: --- when referring to HDFS's {{chmod}} I mean {{setPermissions}} in HDFS, of course define FS permissions model with tests -- Key: HADOOP-10246 URL: https://issues.apache.org/jira/browse/HADOOP-10246 Project: Hadoop Common Issue Type: Sub-task Components: fs Reporter: Steve Loughran Priority: Minor It's interesting that HDFS mkdirs(dir, permission) uses the umask, but setPermissions() does not The permissions model, including umask logic should be defined and have tests implemented by those filesystems that support permissions-based security -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10278) Refactor to make CallQueue pluggable
[ https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10278: -- Attachment: subtask_refactor_callqueue.patch Refactor to make CallQueue pluggable Key: HADOOP-10278 URL: https://issues.apache.org/jira/browse/HADOOP-10278 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Chris Li Attachments: subtask_refactor_callqueue.patch * Refactor CallQueue into an interface, base, and default implementation that matches today's behavior * Make the call queue impl configurable, keyed on port so that we minimize coupling -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10278) Refactor to make CallQueue pluggable
[ https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10278: -- Attachment: subtask_refactor_callqueue2.patch This version has the unit tests Refactor to make CallQueue pluggable Key: HADOOP-10278 URL: https://issues.apache.org/jira/browse/HADOOP-10278 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Chris Li Attachments: subtask_refactor_callqueue.patch * Refactor CallQueue into an interface, base, and default implementation that matches today's behavior * Make the call queue impl configurable, keyed on port so that we minimize coupling -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10278) Refactor to make CallQueue pluggable
[ https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10278: -- Attachment: (was: subtask_refactor_callqueue2.patch) Refactor to make CallQueue pluggable Key: HADOOP-10278 URL: https://issues.apache.org/jira/browse/HADOOP-10278 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Chris Li Attachments: subtask_refactor_callqueue.patch * Refactor CallQueue into an interface, base, and default implementation that matches today's behavior * Make the call queue impl configurable, keyed on port so that we minimize coupling -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10278) Refactor to make CallQueue pluggable
[ https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10278: -- Attachment: subtask_refactor_callqueue2.patch This version has the unit tests and doesn't accidentally delete a different file. Refactor to make CallQueue pluggable Key: HADOOP-10278 URL: https://issues.apache.org/jira/browse/HADOOP-10278 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Chris Li Attachments: subtask_refactor_callqueue.patch, subtask_refactor_callqueue2.patch * Refactor CallQueue into an interface, base, and default implementation that matches today's behavior * Make the call queue impl configurable, keyed on port so that we minimize coupling -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
Chris Li created HADOOP-10279: - Summary: Create multiplexer, a requirement for the fair queue Key: HADOOP-10279 URL: https://issues.apache.org/jira/browse/HADOOP-10279 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10280) Make Schedulables return a configurable identity of user or group
Chris Li created HADOOP-10280: - Summary: Make Schedulables return a configurable identity of user or group Key: HADOOP-10280 URL: https://issues.apache.org/jira/browse/HADOOP-10280 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
Chris Li created HADOOP-10281: - Summary: Create a scheduler, which assigns schedulables a priority level Key: HADOOP-10281 URL: https://issues.apache.org/jira/browse/HADOOP-10281 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10282) Create a FairCallQueue: a multi-level call queue which schedules incoming calls and multiplexes outgoing calls
Chris Li created HADOOP-10282: - Summary: Create a FairCallQueue: a multi-level call queue which schedules incoming calls and multiplexes outgoing calls Key: HADOOP-10282 URL: https://issues.apache.org/jira/browse/HADOOP-10282 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10283) Add metrics to the FairCallQueue
Chris Li created HADOOP-10283: - Summary: Add metrics to the FairCallQueue Key: HADOOP-10283 URL: https://issues.apache.org/jira/browse/HADOOP-10283 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10285) Allow CallQueue impls to be swapped at runtime
Chris Li created HADOOP-10285: - Summary: Allow CallQueue impls to be swapped at runtime Key: HADOOP-10285 URL: https://issues.apache.org/jira/browse/HADOOP-10285 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10284) Add metrics to the HistoryRpcScheduler
Chris Li created HADOOP-10284: - Summary: Add metrics to the HistoryRpcScheduler Key: HADOOP-10284 URL: https://issues.apache.org/jira/browse/HADOOP-10284 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10286) Allow RPCCallBenchmark to benchmark calls by different users
Chris Li created HADOOP-10286: - Summary: Allow RPCCallBenchmark to benchmark calls by different users Key: HADOOP-10286 URL: https://issues.apache.org/jira/browse/HADOOP-10286 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
[ https://issues.apache.org/jira/browse/HADOOP-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10279: -- Attachment: subtask2_add_mux.patch Depends on subtask1: This multiplexer enables takers of the FairCallQueue to withdraw in a weighted round-robin fashion to combat starvation of low-priority Schedulables. Weights are configurable as follows: ipc.8020.wrr-multiplexer.weights = 30, 20, 5 Means queue0 will be drawn from thirty times, queue1 twenty times, queue2 five times, and repeat. Create multiplexer, a requirement for the fair queue Key: HADOOP-10279 URL: https://issues.apache.org/jira/browse/HADOOP-10279 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li Attachments: subtask2_add_mux.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
[ https://issues.apache.org/jira/browse/HADOOP-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10279: -- Status: Patch Available (was: Open) Create multiplexer, a requirement for the fair queue Key: HADOOP-10279 URL: https://issues.apache.org/jira/browse/HADOOP-10279 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li Attachments: subtask2_add_mux.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10278) Refactor to make CallQueue pluggable
[ https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10278: -- Attachment: subtask_refactor_callqueue3.patch This version doesn't include an irrelevant test Refactor to make CallQueue pluggable Key: HADOOP-10278 URL: https://issues.apache.org/jira/browse/HADOOP-10278 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Chris Li Attachments: subtask_refactor_callqueue.patch, subtask_refactor_callqueue2.patch, subtask_refactor_callqueue3.patch * Refactor CallQueue into an interface, base, and default implementation that matches today's behavior * Make the call queue impl configurable, keyed on port so that we minimize coupling -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10280) Make Schedulables return a configurable identity of user or group
[ https://issues.apache.org/jira/browse/HADOOP-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10280: -- Attachment: subtask3_schedulable_identities.patch Allow Schedulables to be queried for identity, which can be configured as follows: ipc.8020.call.identity Make Schedulables return a configurable identity of user or group - Key: HADOOP-10280 URL: https://issues.apache.org/jira/browse/HADOOP-10280 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li Attachments: subtask3_schedulable_identities.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10278) Refactor to make CallQueue pluggable
[ https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10278: -- Attachment: (was: subtask_refactor_callqueue3.patch) Refactor to make CallQueue pluggable Key: HADOOP-10278 URL: https://issues.apache.org/jira/browse/HADOOP-10278 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Chris Li Attachments: subtask_refactor_callqueue.patch, subtask_refactor_callqueue2.patch * Refactor CallQueue into an interface, base, and default implementation that matches today's behavior * Make the call queue impl configurable, keyed on port so that we minimize coupling -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10281) Create a scheduler, which assigns schedulables a priority level
[ https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10281: -- Attachment: subtask4_scheduler.patch The scheduler assigns schedulables a priority level based the past history of requests. It can be configured as follows: ipc.8020.history-scheduler.history-length = 1000 The number of past requests to remember and compare ipc.8020.history-scheduler.thresholds = 33, 66 Dependent on the history-length and the number of priority levels: defines the thresholds that separate each priority level. In this example, we have 3 priority levels and a history length of 100, so we assign thusly: * Queue 2 if count 66 * Queue 1 if count 33 * Queue 0 otherwise Create a scheduler, which assigns schedulables a priority level --- Key: HADOOP-10281 URL: https://issues.apache.org/jira/browse/HADOOP-10281 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li Attachments: subtask4_scheduler.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-9640) RPC Congestion Control with FairCallQueue
[ https://issues.apache.org/jira/browse/HADOOP-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881556#comment-13881556 ] Daryn Sharp commented on HADOOP-9640: - Agreed, this needs subtasks. General comments/requests: # Please make the default callq a {{BlockingQueue}} again, and have your custom implementations conform to the interface. # The default callq should remain a {{LinkedBlockingQueue}}, not a {{FIFOCallQueue}}. You're doing some pretty tricky locking and I'd rather trust the JDK. # Call.getRemoteUser() would be much cleaner to get the UGI than an interface + enum to get user and group. # Using the literal string unknown! for a user or group is not a good idea. The more I think about it, multiple queues will exasperate congestion problem as Kihwal points out. For that reason, I'd like to see minimal invasiveness in the Server class - I'll feel safe and you are free to experiment with alternate implementations. RPC Congestion Control with FairCallQueue - Key: HADOOP-9640 URL: https://issues.apache.org/jira/browse/HADOOP-9640 Project: Hadoop Common Issue Type: Improvement Affects Versions: 3.0.0, 2.2.0 Reporter: Xiaobo Peng Labels: hdfs, qos, rpc Attachments: MinorityMajorityPerformance.pdf, NN-denial-of-service-updated-plan.pdf, faircallqueue.patch, faircallqueue2.patch, faircallqueue3.patch, faircallqueue4.patch, faircallqueue5.patch, faircallqueue6.patch, faircallqueue7_with_runtime_swapping.patch, rpc-congestion-control-draft-plan.pdf Several production Hadoop cluster incidents occurred where the Namenode was overloaded and failed to respond. We can improve quality of service for users during namenode peak loads by replacing the FIFO call queue with a [Fair Call Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf]. (this plan supersedes rpc-congestion-control-draft-plan). Excerpted from the communication of one incident, “The map task of a user was creating huge number of small files in the user directory. Due to the heavy load on NN, the JT also was unable to communicate with NN...The cluster became responsive only once the job was killed.” Excerpted from the communication of another incident, “Namenode was overloaded by GetBlockLocation requests (Correction: should be getFileInfo requests. the job had a bug that called getFileInfo for a nonexistent file in an endless loop). All other requests to namenode were also affected by this and hence all jobs slowed down. Cluster almost came to a grinding halt…Eventually killed jobtracker to kill all jobs that are running.” Excerpted from HDFS-945, “We've seen defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc.” -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10278) Refactor to make CallQueue pluggable
[ https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10278: -- Attachment: (was: subtask_refactor_callqueue2.patch) Refactor to make CallQueue pluggable Key: HADOOP-10278 URL: https://issues.apache.org/jira/browse/HADOOP-10278 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Chris Li * Refactor CallQueue into an interface, base, and default implementation that matches today's behavior * Make the call queue impl configurable, keyed on port so that we minimize coupling -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10278) Refactor to make CallQueue pluggable
[ https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10278: -- Attachment: (was: subtask_refactor_callqueue.patch) Refactor to make CallQueue pluggable Key: HADOOP-10278 URL: https://issues.apache.org/jira/browse/HADOOP-10278 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Chris Li * Refactor CallQueue into an interface, base, and default implementation that matches today's behavior * Make the call queue impl configurable, keyed on port so that we minimize coupling -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-9640) RPC Congestion Control with FairCallQueue
[ https://issues.apache.org/jira/browse/HADOOP-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881571#comment-13881571 ] Chris Li commented on HADOOP-9640: -- [~daryn] Thanks for your feedback. Some points of clarification: 3. The identity is meant to be configurable, so you can schedule by user, by group, and in the future by job. 4. Any suggestions? RPC Congestion Control with FairCallQueue - Key: HADOOP-9640 URL: https://issues.apache.org/jira/browse/HADOOP-9640 Project: Hadoop Common Issue Type: Improvement Affects Versions: 3.0.0, 2.2.0 Reporter: Xiaobo Peng Labels: hdfs, qos, rpc Attachments: MinorityMajorityPerformance.pdf, NN-denial-of-service-updated-plan.pdf, faircallqueue.patch, faircallqueue2.patch, faircallqueue3.patch, faircallqueue4.patch, faircallqueue5.patch, faircallqueue6.patch, faircallqueue7_with_runtime_swapping.patch, rpc-congestion-control-draft-plan.pdf Several production Hadoop cluster incidents occurred where the Namenode was overloaded and failed to respond. We can improve quality of service for users during namenode peak loads by replacing the FIFO call queue with a [Fair Call Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf]. (this plan supersedes rpc-congestion-control-draft-plan). Excerpted from the communication of one incident, “The map task of a user was creating huge number of small files in the user directory. Due to the heavy load on NN, the JT also was unable to communicate with NN...The cluster became responsive only once the job was killed.” Excerpted from the communication of another incident, “Namenode was overloaded by GetBlockLocation requests (Correction: should be getFileInfo requests. the job had a bug that called getFileInfo for a nonexistent file in an endless loop). All other requests to namenode were also affected by this and hence all jobs slowed down. Cluster almost came to a grinding halt…Eventually killed jobtracker to kill all jobs that are running.” Excerpted from HDFS-945, “We've seen defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc.” -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HADOOP-10278) Refactor to make CallQueue pluggable
[ https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10278: -- Attachment: subtask1.patch As per [~daryn]'s suggestions, this patch retains the original interface of the callQueue in Server. This patch allows the Server to use a custom implementation of BlockingQueue if the user defines ipc.8020.callqueue.impl It includes one such implementation, the FIFOCallQueue, which simply imitates the LinkedBlockingQueue (and uses the same 2-lock algorithm used in the JDK's implementation). Though it seems redundant, the FIFOCallQueue will have greater flexibility in that it can be swapped out at runtime (coming in a later patch). Refactor to make CallQueue pluggable Key: HADOOP-10278 URL: https://issues.apache.org/jira/browse/HADOOP-10278 Project: Hadoop Common Issue Type: Sub-task Components: ipc Reporter: Chris Li Attachments: subtask1.patch * Refactor CallQueue into an interface, base, and default implementation that matches today's behavior * Make the call queue impl configurable, keyed on port so that we minimize coupling -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-9640) RPC Congestion Control with FairCallQueue
[ https://issues.apache.org/jira/browse/HADOOP-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881614#comment-13881614 ] Chris Li commented on HADOOP-9640: -- I've uploaded the first of the patches to https://issues.apache.org/jira/browse/HADOOP-10278 It allows the user to use a custom call queue specified via configuration, but falls back on a LinkedBlockingQueue otherwise. I'd like to take any further discussions about this aspect to the subtask, and get some feedback. Thanks RPC Congestion Control with FairCallQueue - Key: HADOOP-9640 URL: https://issues.apache.org/jira/browse/HADOOP-9640 Project: Hadoop Common Issue Type: Improvement Affects Versions: 3.0.0, 2.2.0 Reporter: Xiaobo Peng Labels: hdfs, qos, rpc Attachments: MinorityMajorityPerformance.pdf, NN-denial-of-service-updated-plan.pdf, faircallqueue.patch, faircallqueue2.patch, faircallqueue3.patch, faircallqueue4.patch, faircallqueue5.patch, faircallqueue6.patch, faircallqueue7_with_runtime_swapping.patch, rpc-congestion-control-draft-plan.pdf Several production Hadoop cluster incidents occurred where the Namenode was overloaded and failed to respond. We can improve quality of service for users during namenode peak loads by replacing the FIFO call queue with a [Fair Call Queue|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf]. (this plan supersedes rpc-congestion-control-draft-plan). Excerpted from the communication of one incident, “The map task of a user was creating huge number of small files in the user directory. Due to the heavy load on NN, the JT also was unable to communicate with NN...The cluster became responsive only once the job was killed.” Excerpted from the communication of another incident, “Namenode was overloaded by GetBlockLocation requests (Correction: should be getFileInfo requests. the job had a bug that called getFileInfo for a nonexistent file in an endless loop). All other requests to namenode were also affected by this and hence all jobs slowed down. Cluster almost came to a grinding halt…Eventually killed jobtracker to kill all jobs that are running.” Excerpted from HDFS-945, “We've seen defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc.” -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10167) Mark hadoop-common source as UTF-8 in Maven pom files / refactoring
[ https://issues.apache.org/jira/browse/HADOOP-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881683#comment-13881683 ] Hudson commented on HADOOP-10167: - SUCCESS: Integrated in Hadoop-trunk-Commit #5036 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5036/]) HADOOP-10167. Mark hadoop-common source as UTF-8 in Maven pom files / refactoring. Contributed by Mikhail Antonov. (cos: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1560831) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/pom.xml * /hadoop/common/trunk/hadoop-project/pom.xml * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/pom.xml * /hadoop/common/trunk/hadoop-tools/hadoop-openstack/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/pom.xml * /hadoop/common/trunk/pom.xml Mark hadoop-common source as UTF-8 in Maven pom files / refactoring --- Key: HADOOP-10167 URL: https://issues.apache.org/jira/browse/HADOOP-10167 Project: Hadoop Common Issue Type: Improvement Components: build Affects Versions: 2.0.6-alpha Environment: Fedora 19 x86-64 Reporter: Mikhail Antonov Labels: build Fix For: 3.0.0, 2.3.0 Attachments: HADOOP-10167-1.patch While looking at BIGTOP-831, turned out that the way Bigtop calls maven build / site:site generation causes the errors like this: [ERROR] Exit code: 1 - /home/user/jenkins/workspace/BigTop-RPM/label/centos-6-x86_64-HAD-1-buildbot/bigtop-repo/build/hadoop/rpm/BUILD/hadoop-2.0.2-alpha-src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/source/JvmMetricsInfo.java:31: error: unmappable character for encoding ANSI_X3.4-1968 [ERROR] JvmMetrics(JVM related metrics etc.), // record info?? Making the whole hadoop-common to use UTF-8 fixes that and seems in general good thing to me. Attaching first version of patch for review. Original issue was observed on openjdk 7 (x86-64). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-9652) Allow RawLocalFs#getFileLinkStatus to fill in the link owner and mode if requested
[ https://issues.apache.org/jira/browse/HADOOP-9652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881681#comment-13881681 ] Hudson commented on HADOOP-9652: SUCCESS: Integrated in Hadoop-trunk-Commit #5036 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5036/]) Addendum patch for HADOOP-9652 to fix performance problems. Contributed by Andrew Wang (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1561038) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestSymlinkLocalFS.java Allow RawLocalFs#getFileLinkStatus to fill in the link owner and mode if requested -- Key: HADOOP-9652 URL: https://issues.apache.org/jira/browse/HADOOP-9652 Project: Hadoop Common Issue Type: Improvement Reporter: Colin Patrick McCabe Assignee: Andrew Wang Fix For: 2.4.0 Attachments: 0001-temporarily-disable-HADOOP-9652.patch, hadoop-9452-1.patch, hadoop-9652-2.patch, hadoop-9652-3.patch, hadoop-9652-4.patch, hadoop-9652-5.patch, hadoop-9652-6.patch, hadoop-9652-workaround.patch {{RawLocalFs#getFileLinkStatus}} does not actually get the owner and mode of the symlink, but instead uses the owner and mode of the symlink target. If the target can't be found, it fills in bogus values (the empty string and FsPermission.getDefault) for these. Symlinks have an owner distinct from the owner of the target they point to, and getFileLinkStatus ought to expose this. In some operating systems, symlinks can have a permission other than 0777. We ought to expose this in RawLocalFilesystem and other places, although we don't necessarily have to support this behavior in HDFS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10248) Property name should be included in the exception where property value is null
[ https://issues.apache.org/jira/browse/HADOOP-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881680#comment-13881680 ] Hudson commented on HADOOP-10248: - SUCCESS: Integrated in Hadoop-trunk-Commit #5036 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5036/]) HADOOP-10248. Property name should be included in the exception where property value is null. Contributed by Akira AJISAKA. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1560906) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfiguration.java Property name should be included in the exception where property value is null -- Key: HADOOP-10248 URL: https://issues.apache.org/jira/browse/HADOOP-10248 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Ted Yu Assignee: Akira AJISAKA Labels: newbie Fix For: 3.0.0, 2.3.0 Attachments: HADOOP-10248.2.patch, HADOOP-10248.patch I saw the following when trying to determine startup failure: {code} 2014-01-21 06:07:17,871 FATAL [master:h2-centos6-uns-1390276854-hbase-10:6] master.HMaster: Unhandled exception. Starting shutdown. java.lang.IllegalArgumentException: Property value must not be null at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) at org.apache.hadoop.conf.Configuration.set(Configuration.java:958) at org.apache.hadoop.conf.Configuration.set(Configuration.java:940) at org.apache.hadoop.http.HttpServer.initializeWebServer(HttpServer.java:510) at org.apache.hadoop.http.HttpServer.init(HttpServer.java:470) at org.apache.hadoop.http.HttpServer.init(HttpServer.java:458) at org.apache.hadoop.http.HttpServer.init(HttpServer.java:412) at org.apache.hadoop.hbase.util.InfoServer.init(InfoServer.java:59) {code} Property name should be included in the following exception: {code} Preconditions.checkArgument( value != null, Property value must not be null); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10279) Create multiplexer, a requirement for the fair queue
[ https://issues.apache.org/jira/browse/HADOOP-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881688#comment-13881688 ] Hadoop QA commented on HADOOP-10279: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625146/subtask2_add_mux.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3466//console This message is automatically generated. Create multiplexer, a requirement for the fair queue Key: HADOOP-10279 URL: https://issues.apache.org/jira/browse/HADOOP-10279 Project: Hadoop Common Issue Type: Sub-task Reporter: Chris Li Attachments: subtask2_add_mux.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HADOOP-10287) FSOutputSummer should support any checksum size
Laurent Goujon created HADOOP-10287: --- Summary: FSOutputSummer should support any checksum size Key: HADOOP-10287 URL: https://issues.apache.org/jira/browse/HADOOP-10287 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.2.0, 3.0.0 Reporter: Laurent Goujon HADOOP-9114 only fixes if checksum size is 0, but doesn't handle the generic case. FSOutputSummer should work with any checksum size (between 0 and 8 since Checksum.getValue() returns a long) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10136) Custom JMX server to avoid random port usage by default JMX Server
[ https://issues.apache.org/jira/browse/HADOOP-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881732#comment-13881732 ] nijel commented on HADOOP-10136: bq.Unfortunately, some Hadoop daemons are using ports in the ephemeral range as if they were fixed ports. In this case we can change the default port right ? In case of JMX even if we need to configure it is not possible. So i think better to keep this JMX server as an option. Custom JMX server to avoid random port usage by default JMX Server -- Key: HADOOP-10136 URL: https://issues.apache.org/jira/browse/HADOOP-10136 Project: Hadoop Common Issue Type: Improvement Reporter: Vinay Assignee: Vinay Attachments: HADOOP-10136.patch If any of the java process want to enable the JMX MBean server then following VM arguments needs to be passed. {code} -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=14005 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false{code} But the issue here is this will use one more random port other than 14005 while starting JMX. This can be a problem if that random port is used for some other service. So support a custom JMX Server through which random port can be avoided. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-8476) Remove duplicate VM arguments for hadoop deamon
[ https://issues.apache.org/jira/browse/HADOOP-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881733#comment-13881733 ] nijel commented on HADOOP-8476: --- I also went wrong with duplicate values for mx !! Vinay, can you update the patch ? Remove duplicate VM arguments for hadoop deamon --- Key: HADOOP-8476 URL: https://issues.apache.org/jira/browse/HADOOP-8476 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Vinay Assignee: Vinay Priority: Minor Attachments: HADOOP-8476.patch, HADOOP-8476.patch remove duplicate VM arguments passed to hadoop daemon Following are the VM arguments currently duplicated. {noformat}-Dproc_namenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Xmx128m -Xmx128m -Dhadoop.log.dir=/home/nn2/logs -Dhadoop.log.file=hadoop-root-namenode-HOST-xx-xx-xx-105.log -Dhadoop.home.dir=/home/nn2/ -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS{noformat} In above VM argumants -Xmx1000m will be Overridden by -Xmx128m. BTW Other duplicate arguments wont harm -- This message was sent by Atlassian JIRA (v6.1.5#6160)