[jira] [Commented] (YARN-200) yarn log does not output all needed information, and is in a binary format
[ https://issues.apache.org/jira/browse/YARN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607496#comment-13607496 ] Hudson commented on YARN-200: - Integrated in Hadoop-Yarn-trunk #161 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/161/]) YARN-200. yarn log does not output all needed information, and is in a binary format. Contributed by Ravi Prakash (Revision 1458466) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458466 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogDumper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java yarn log does not output all needed information, and is in a binary format -- Key: YARN-200 URL: https://issues.apache.org/jira/browse/YARN-200 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 0.23.5 Reporter: Robert Joseph Evans Assignee: Ravi Prakash Labels: usability Fix For: 0.23.7, 2.0.5-beta Attachments: YARN-200.patch, YARN-200.patch yarn logs does not output attemptid, nodename, or container-id. Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts. Also the output currently includes several binary characters. This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep. The help message can also be more useful to users -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-200) yarn log does not output all needed information, and is in a binary format
[ https://issues.apache.org/jira/browse/YARN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607564#comment-13607564 ] Hudson commented on YARN-200: - Integrated in Hadoop-Hdfs-0.23-Build #559 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/559/]) svn merge -c 1458466 FIXES: YARN-200. yarn log does not output all needed information, and is in a binary format. Contributed by Ravi Prakash (Revision 1458474) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458474 Files : * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogDumper.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java yarn log does not output all needed information, and is in a binary format -- Key: YARN-200 URL: https://issues.apache.org/jira/browse/YARN-200 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 0.23.5 Reporter: Robert Joseph Evans Assignee: Ravi Prakash Labels: usability Fix For: 0.23.7, 2.0.5-beta Attachments: YARN-200.patch, YARN-200.patch yarn logs does not output attemptid, nodename, or container-id. Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts. Also the output currently includes several binary characters. This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep. The help message can also be more useful to users -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-200) yarn log does not output all needed information, and is in a binary format
[ https://issues.apache.org/jira/browse/YARN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607571#comment-13607571 ] Hudson commented on YARN-200: - Integrated in Hadoop-Hdfs-trunk #1350 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1350/]) YARN-200. yarn log does not output all needed information, and is in a binary format. Contributed by Ravi Prakash (Revision 1458466) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458466 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogDumper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java yarn log does not output all needed information, and is in a binary format -- Key: YARN-200 URL: https://issues.apache.org/jira/browse/YARN-200 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 0.23.5 Reporter: Robert Joseph Evans Assignee: Ravi Prakash Labels: usability Fix For: 0.23.7, 2.0.5-beta Attachments: YARN-200.patch, YARN-200.patch yarn logs does not output attemptid, nodename, or container-id. Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts. Also the output currently includes several binary characters. This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep. The help message can also be more useful to users -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-200) yarn log does not output all needed information, and is in a binary format
[ https://issues.apache.org/jira/browse/YARN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607627#comment-13607627 ] Hudson commented on YARN-200: - Integrated in Hadoop-Mapreduce-trunk #1378 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1378/]) YARN-200. yarn log does not output all needed information, and is in a binary format. Contributed by Ravi Prakash (Revision 1458466) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458466 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogDumper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java yarn log does not output all needed information, and is in a binary format -- Key: YARN-200 URL: https://issues.apache.org/jira/browse/YARN-200 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 0.23.5 Reporter: Robert Joseph Evans Assignee: Ravi Prakash Labels: usability Fix For: 0.23.7, 2.0.5-beta Attachments: YARN-200.patch, YARN-200.patch yarn logs does not output attemptid, nodename, or container-id. Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts. Also the output currently includes several binary characters. This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep. The help message can also be more useful to users -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-492) Too many open files error to launch a container
[ https://issues.apache.org/jira/browse/YARN-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607812#comment-13607812 ] Hitesh Shah commented on YARN-492: -- Please add details of which processes are using the ports in question. In addition to that, what configuration value was set to make use of port 50010 and/or 44871? Too many open files error to launch a container --- Key: YARN-492 URL: https://issues.apache.org/jira/browse/YARN-492 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.0-alpha Environment: RedHat Linux Reporter: Krishna Kishore Bonagiri I am running a date command with YARN's distributed shell example in a loop of 1000 times in this way: yarn jar /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar --shell_command date --num_containers 2 Around 730th time or so, I am getting an error in node manager's log saying that it failed to launch container because there are Too many open files and when I observe through lsof command,I find that there is one instance of this kind of file is left for each run of Application Master, and it kept growing as I am running it in loop. node1:44871-node1:50010 Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-492) Too many open files error to launch a container
[ https://issues.apache.org/jira/browse/YARN-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607819#comment-13607819 ] Hitesh Shah commented on YARN-492: -- Nevermind, 50010 is the default datanode port. What process is opening up 44871? If it is the node manager, do you have log aggregation enabled? Could you try running the test with log aggregation disabled and let us know if the problem is still reproducible? Too many open files error to launch a container --- Key: YARN-492 URL: https://issues.apache.org/jira/browse/YARN-492 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.0-alpha Environment: RedHat Linux Reporter: Krishna Kishore Bonagiri I am running a date command with YARN's distributed shell example in a loop of 1000 times in this way: yarn jar /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar --shell_command date --num_containers 2 Around 730th time or so, I am getting an error in node manager's log saying that it failed to launch container because there are Too many open files and when I observe through lsof command,I find that there is one instance of this kind of file is left for each run of Application Master, and it kept growing as I am running it in loop. node1:44871-node1:50010 Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] omkar vinit joshi reassigned YARN-112: -- Assignee: omkar vinit joshi Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: omkar vinit joshi On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he updated YARN-479: - Attachment: YARN-479.1.patch NM retry behavior for connection to RM should be similar for lost heartbeats Key: YARN-479 URL: https://issues.apache.org/jira/browse/YARN-479 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: jian he Attachments: YARN-479.1.patch Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607910#comment-13607910 ] Hadoop QA commented on YARN-479: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574579/YARN-479.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/549//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/549//console This message is automatically generated. NM retry behavior for connection to RM should be similar for lost heartbeats Key: YARN-479 URL: https://issues.apache.org/jira/browse/YARN-479 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: jian he Attachments: YARN-479.1.patch Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-490) TestDistributedShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated YARN-490: --- Labels: windows (was: ) TestDistributedShell fails on Windows - Key: YARN-490 URL: https://issues.apache.org/jira/browse/YARN-490 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Labels: windows Attachments: YARN-490.1.patch There are a few platform-specific assumption in distributed shell (both main code and test code) that prevent it from working correctly on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-472) MR app master deletes staging dir when sent a reboot command from the RM
[ https://issues.apache.org/jira/browse/YARN-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-472: Issue Type: Bug (was: Sub-task) Parent: (was: YARN-128) MR app master deletes staging dir when sent a reboot command from the RM Key: YARN-472 URL: https://issues.apache.org/jira/browse/YARN-472 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: jian he Assignee: jian he Attachments: YARN-472.1.patch, YARN-472.2.patch If the RM is restarted when the MR job is running, then it sends a reboot command to the job. The job ends up deleting the staging dir and that causes the next attempt to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608004#comment-13608004 ] Hadoop QA commented on YARN-417: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574596/YARN-417-5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/550//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/550//console This message is automatically generated. Add a poller that allows the AM to receive notifications when it is assigned containers --- Key: YARN-417 URL: https://issues.apache.org/jira/browse/YARN-417 Project: Hadoop YARN Issue Type: Sub-task Components: api, applications Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, YARN-417-1.patch, YARN-417-2.patch, YARN-417-3.patch, YARN-417-4.patch, YARN-417-4.patch, YARN-417-5.patch, YARN-417.patch, YarnAppMaster.java, YarnAppMasterListener.java Writing AMs would be easier for some if they did not have to handle heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-493) TestContainerManager fails on Windows
Chris Nauroth created YARN-493: -- Summary: TestContainerManager fails on Windows Key: YARN-493 URL: https://issues.apache.org/jira/browse/YARN-493 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.0.0 The tests contain some platform-specific assumptions, such as availability of bash for executing a command in a container and signals to check existence of a process and terminate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-487) TestDiskFailures fails on Windows due to path mishandling
[ https://issues.apache.org/jira/browse/YARN-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608061#comment-13608061 ] Arpit Agarwal commented on YARN-487: +1 Verified on Windows and OS X. TestDiskFailures fails on Windows due to path mishandling - Key: YARN-487 URL: https://issues.apache.org/jira/browse/YARN-487 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-487.1.patch {{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an extra leading '/' on the path within {{LocalDirsHandlerService}} when running on Windows. The test assertions also fail to account for the fact that {{Path}} normalizes '\' to '/'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-491) TestContainerLogsPage fails on Windows
[ https://issues.apache.org/jira/browse/YARN-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608079#comment-13608079 ] Arpit Agarwal commented on YARN-491: +1 Verified on Windows and OS X. Thanks for all the YARN fixes! TestContainerLogsPage fails on Windows -- Key: YARN-491 URL: https://issues.apache.org/jira/browse/YARN-491 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-491.1.patch {{TestContainerLogsPage}} contains some code for initializing a log directory that doesn't work correctly on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-396) Rationalize AllocateResponse in RM scheduler API
[ https://issues.apache.org/jira/browse/YARN-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-396: - Hadoop Flags: Incompatible change,Reviewed Rationalize AllocateResponse in RM scheduler API Key: YARN-396 URL: https://issues.apache.org/jira/browse/YARN-396 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Labels: incompatible Attachments: YARN-396_1.patch, YARN-396_2.patch, YARN-396_3.patch, YARN-396_4.patch, YARN-396_5.patch AllocateResponse contains an AMResponse and cluster node count. AMResponse that more data. Unless there is a good reason for this object structure, there should be either AMResponse or AllocateResponse. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-396) Rationalize AllocateResponse in RM scheduler API
[ https://issues.apache.org/jira/browse/YARN-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608139#comment-13608139 ] Hitesh Shah commented on YARN-396: -- Changes look good. Will commit shortly to trunk and branch-2. Rationalize AllocateResponse in RM scheduler API Key: YARN-396 URL: https://issues.apache.org/jira/browse/YARN-396 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Labels: incompatible Attachments: YARN-396_1.patch, YARN-396_2.patch, YARN-396_3.patch, YARN-396_4.patch, YARN-396_5.patch AllocateResponse contains an AMResponse and cluster node count. AMResponse that more data. Unless there is a good reason for this object structure, there should be either AMResponse or AllocateResponse. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-396) Rationalize AllocateResponse in RM scheduler API
[ https://issues.apache.org/jira/browse/YARN-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608190#comment-13608190 ] Hudson commented on YARN-396: - Integrated in Hadoop-trunk-Commit #3497 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3497/]) YARN-396. Rationalize AllocateResponse in RM Scheduler API. Contributed by Zhijie Shen. (Revision 1459040) Result = SUCCESS hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1459040 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/local/LocalContainerAllocator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerRequestor.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRAppBenchmark.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetAllApplicationsResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetClusterNodesResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetQueueUserAclsInfoResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/AMResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/AMResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/AMRMClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestAMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRecordFactory.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCResponseId.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestApplicationTokens.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java *
[jira] [Commented] (YARN-297) Improve hashCode implementations for PB records
[ https://issues.apache.org/jira/browse/YARN-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608204#comment-13608204 ] Hitesh Shah commented on YARN-297: -- +1. Will commit shortly. Improve hashCode implementations for PB records --- Key: YARN-297 URL: https://issues.apache.org/jira/browse/YARN-297 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Xuan Gong Attachments: YARN.297.1.patch, YARN-297.2.patch As [~hsn] pointed out in YARN-2, we use very small primes in all our hashCode implementations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-379) yarn [node,application] command print logger info messages
[ https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608213#comment-13608213 ] Ravi Prakash commented on YARN-379: --- Can we not simply add log4j.category.org.apache.hadoop.yarn.service.AbstractService=WARN to the log4j.properties file? In my testing, this prevented INFO messages on the console but not in the log file for the AM (which I don't completely understand how that were possible). This is obviously dependent on my log4j.properties file and I believe that is where it should be handled yarn [node,application] command print logger info messages -- Key: YARN-379 URL: https://issues.apache.org/jira/browse/YARN-379 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.0.3-alpha Reporter: Thomas Graves Assignee: Abhishek Kapoor Labels: usability Attachments: YARN-379.patch Running the yarn node and yarn applications command results in annoying log info messages being printed: $ yarn node -list 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Total Nodes:1 Node-IdNode-State Node-Http-Address Health-Status(isNodeHealthy)Running-Containers foo:8041RUNNING foo:8042 true 0 13/02/06 02:36:50 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped. $ yarn application 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Invalid Command Usage : usage: application -kill arg Kills the application. -list Lists all the Applications from RM. -status arg Prints the status of the application. 13/02/06 02:38:47 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-470) Support a way to disable resource monitoring on the NodeManager
[ https://issues.apache.org/jira/browse/YARN-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608248#comment-13608248 ] Hitesh Shah commented on YARN-470: -- Comments: - missing changes in yarn-default.xml - question regarding NodeInfo which reports totalPMem and totalVMem - is the expectation that it should return the actual configured value or -1 if memory checks are disabled? The node information sent to the RM in the heartbeat is the actual amount whereas the NM UI seems to be displaying something else. Does it makes sense to add the memory monitoring flags as separate bits of information? - this could allow a function like isPhysicalMemoryCheckEnabled to just return the flag instead of overloading the totalPMemValue when monitoring is disabled. - is this needed: s/YarnConfiguration.DEFAULT_NM_PMEM_MB) * 1024 * 1024/YarnConfiguration.DEFAULT_NM_PMEM_MB) * 1024 * 1024L/ ? ( missing long qualifier on the last 1024 ) ? Support a way to disable resource monitoring on the NodeManager --- Key: YARN-470 URL: https://issues.apache.org/jira/browse/YARN-470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Siddharth Seth Labels: usability Attachments: YARN-470.txt Currently, the memory management monitor's check is disabled when the maxMem is set to -1. However, the maxMem is also sent to the RM when the NM registers with it ( to define the max limit of allocate-able resources ). We need an explicit flag to disable monitoring to avoid the problems caused by the overloading of the max memory value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608279#comment-13608279 ] Bikas Saha commented on YARN-417: - IMO, the locking intent will be more clear if we set keepRunning inside the lock because essentially that is also a shared value that we are guarding. The client.allocate() and client.unregister() are themselves already synchronized on the inner rmClient. {code} + public void unregisterApplicationMaster(FinalApplicationStatus appStatus, + String appMessage, String appTrackingUrl) throws YarnRemoteException { +keepRunning = false; +synchronized (client) { + client.unregisterApplicationMaster(appStatus, appMessage, appTrackingUrl); +} {code} I guess now the outer while loop can actually become a while(true) with the inner check for if(keepRunning) causing a break when it fails. I like this pattern because then, when I read the code, I clearly see that the outer loop is purely a run-to-infinity loop and I dont have to keep that condition in mind when I try to grok the inner if condition that actually controls the loop action. What do you think? {code} + while (keepRunning) { +try { + AllocateResponse response; + synchronized (client) { +// ensure we don't send heartbeats after unregistering +if (keepRunning) { + response = client.allocate(progress); {code} Your comments on usage of the async client dont mention anything about the callbacks in the exemplary code flow (which is essentially the new changes in this jira) :) The patch needs to be rebased because YARN-396 went in that merged AMResponse into AllocateResponse. Add a poller that allows the AM to receive notifications when it is assigned containers --- Key: YARN-417 URL: https://issues.apache.org/jira/browse/YARN-417 Project: Hadoop YARN Issue Type: Sub-task Components: api, applications Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, YARN-417-1.patch, YARN-417-2.patch, YARN-417-3.patch, YARN-417-4.patch, YARN-417-4.patch, YARN-417-5.patch, YARN-417.patch, YarnAppMaster.java, YarnAppMasterListener.java Writing AMs would be easier for some if they did not have to handle heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608333#comment-13608333 ] omkar vinit joshi commented on YARN-112: This problem is occurring mainly because createDir call on FileContext is not throwing exception in case the file system is RawLocalFileSystem. So if the directory is already present then new createDir will silently return instead of throwing exception. This is causing the race condition to occur in case two containers try to localize at the same time and get same random number. However rename call is an atomic call and to avoid the race condition we should use it. Earlier implementation 1) generate random num (r1) 2) check if the r1 is present.. if present go to 1 else 2 3) create directories r1 and r1_tmp 4) copy the files into r1_tmp 5) rename r1_tmp to r1 ( This is an atomic call and only one thread will succeed. Rest of them will fail. Error listed is just one of the errors which might be logged). Suggested Fix 1) generate random num (r1) 2) check if r1 is present if present go to 1) else 3) 3) create dir r1 4) rename r1 to r1_tmp (only one will succeed .. rest of the threads will get an exception and will continue to 1) 5) check if there exists file inside r1_tmp if present rename it back to r1 and go to 1) else go to 6 ( This check is added because if we get threads with same random number and passes check 2.. then one thread completely finishes download in which case it will rename r1_tmp back to r1... so for the other thread which now comes into picture rename call ( r1 to r1_tmp ) will succeed. However this should be avoided. This we can avoid by checking the contents of r1_tmp). 6) create r1 7) continue with actual file download. 8) rename r1_tmp to r1. Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: omkar vinit joshi On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-109) .tmp file is not deleted for localized archives
[ https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608378#comment-13608378 ] Mayank Bansal commented on YARN-109: [~ojoshi] Good points. Adding another patch. Thanks, Mayank .tmp file is not deleted for localized archives --- Key: YARN-109 URL: https://issues.apache.org/jira/browse/YARN-109 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Mayank Bansal Attachments: YARN-109-trunk-1.patch, YARN-109-trunk-2.patch, YARN-109-trunk.patch When archives are localized they are initially created as a .tmp file and unpacked from that file. However the .tmp file is not deleted afterwards. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-109) .tmp file is not deleted for localized archives
[ https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-109: --- Attachment: YARN-109-trunk-2.patch .tmp file is not deleted for localized archives --- Key: YARN-109 URL: https://issues.apache.org/jira/browse/YARN-109 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Mayank Bansal Attachments: YARN-109-trunk-1.patch, YARN-109-trunk-2.patch, YARN-109-trunk.patch When archives are localized they are initially created as a .tmp file and unpacked from that file. However the .tmp file is not deleted afterwards. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-488) TestContainerManagerSecurity fails on Windows
[ https://issues.apache.org/jira/browse/YARN-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608438#comment-13608438 ] Hitesh Shah commented on YARN-488: -- {code} if (inputClassPath != null) {code} Does it make sense to change to: {code} if (inputClassPath != null !inputClassPath.isEmpty()) {code} TestContainerManagerSecurity fails on Windows - Key: YARN-488 URL: https://issues.apache.org/jira/browse/YARN-488 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-488.1.patch These tests are failing to launch containers correctly when running on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-470) Support a way to disable resource monitoring on the NodeManager
[ https://issues.apache.org/jira/browse/YARN-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608449#comment-13608449 ] Hadoop QA commented on YARN-470: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574683/YARN-470_2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/552//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/552//console This message is automatically generated. Support a way to disable resource monitoring on the NodeManager --- Key: YARN-470 URL: https://issues.apache.org/jira/browse/YARN-470 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Siddharth Seth Labels: usability Attachments: YARN-470_2.txt, YARN-470.txt Currently, the memory management monitor's check is disabled when the maxMem is set to -1. However, the maxMem is also sent to the RM when the NM registers with it ( to define the max limit of allocate-able resources ). We need an explicit flag to disable monitoring to avoid the problems caused by the overloading of the max memory value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
[ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-417: Attachment: YARN-417-6.patch Updated patch makes the changes suggested in Bikas's comment, including the rebase. For the while loop, I moved things around a little in a way that seems more clear to me. Add a poller that allows the AM to receive notifications when it is assigned containers --- Key: YARN-417 URL: https://issues.apache.org/jira/browse/YARN-417 Project: Hadoop YARN Issue Type: Sub-task Components: api, applications Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, YARN-417-1.patch, YARN-417-2.patch, YARN-417-3.patch, YARN-417-4.patch, YARN-417-4.patch, YARN-417-5.patch, YARN-417-6.patch, YARN-417.patch, YarnAppMaster.java, YarnAppMasterListener.java Writing AMs would be easier for some if they did not have to handle heartbeating to the RM on their own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable
[ https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608522#comment-13608522 ] Sandy Ryza commented on YARN-24: I verified this on a pseudo-distributed cluster in the following way: * Started up yarn expecting a namenode port of 7654. * Started up HDFS with default namenode port of 9000. * Ran a pi job. * Verified that log aggregation failed because the nodemanager couldn't find the namenode. * Restarted HDFS with the namenode port 7654. * Ran another YARN job. * Verified that the logs from the second job showed up in the UI and that the logs from the first job didn't. Nodemanager fails to start if log aggregation enabled and namenode unavailable -- Key: YARN-24 URL: https://issues.apache.org/jira/browse/YARN-24 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Sandy Ryza Attachments: YARN-24-1.patch, YARN-24-2.patch, YARN-24.patch If log aggregation is enabled and the namenode is currently unavailable, the nodemanager fails to startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-99) Jobs fail during resource localization when directories in file cache reaches to unix directory limit
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] omkar vinit joshi reassigned YARN-99: - Assignee: omkar vinit joshi (was: Devaraj K) Jobs fail during resource localization when directories in file cache reaches to unix directory limit - Key: YARN-99 URL: https://issues.apache.org/jira/browse/YARN-99 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Devaraj K Assignee: omkar vinit joshi If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. {code:xml} java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-378: - Attachment: YARN-378_10.patch Clean up some whitespace characters. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_10.patch, YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch, YARN-378_9.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-488) TestContainerManagerSecurity fails on Windows
[ https://issues.apache.org/jira/browse/YARN-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-488: --- Attachment: YARN-488.2.patch Thanks, Hitesh. AFAIK, there is no significant difference between no classpath and an empty classpath, so I do think it's correct to change the condition to check for empty string. Here is a new patch that does that. TestContainerManagerSecurity fails on Windows - Key: YARN-488 URL: https://issues.apache.org/jira/browse/YARN-488 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-488.1.patch, YARN-488.2.patch These tests are failing to launch containers correctly when running on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-378: - Attachment: YARN-378_MAPREDUCE-5062.patch Combine the latest patches of YARN-378 and MAPREDUCE-5062 to allow Jenkins to run and verify them. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_10.patch, YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch, YARN-378_9.patch, YARN-378_MAPREDUCE-5062.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-488) TestContainerManagerSecurity fails on Windows
[ https://issues.apache.org/jira/browse/YARN-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608615#comment-13608615 ] Hadoop QA commented on YARN-488: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574718/YARN-488.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/553//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/553//console This message is automatically generated. TestContainerManagerSecurity fails on Windows - Key: YARN-488 URL: https://issues.apache.org/jira/browse/YARN-488 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: YARN-488.1.patch, YARN-488.2.patch These tests are failing to launch containers correctly when running on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608632#comment-13608632 ] Xuan Gong commented on YARN-479: Couple of comments on the latest one (479-2): 1. In the while(true) loop at NodeStatusUpdaterImpl : startStatusUpdater() : rmRetryCount ++ and response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse() can be in the try block, others such as NodeStatus nodeStatus = getNodeStatus(), etc, I think we can move them out of while(true) loop. We only consider losting heartBeatResponse. 2.please re-phrase the warning message and error message for more clarity - something along the lines of did not get the heartbeat response ... 3. testNMRegistration may not be a good place to test the changes. You can re-write your own ResourceTracker and NodeStatusUpdater to mimic the heartbeat response lose, and test your code if it can handle properly. Take a look at the MyNodeStatusUpdater and MyResourceTracker class, they can tell you how to do that. NM retry behavior for connection to RM should be similar for lost heartbeats Key: YARN-479 URL: https://issues.apache.org/jira/browse/YARN-479 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: jian he Attachments: YARN-479.1.patch, YARN-479.2.patch Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608634#comment-13608634 ] Xuan Gong commented on YARN-479: minor question : why add Assert response != null ? Trying to test post-condition here ? If response == null, what will happen ? I mean, if response == null, the following code response.getNodeAction() will give error anyway. NM retry behavior for connection to RM should be similar for lost heartbeats Key: YARN-479 URL: https://issues.apache.org/jira/browse/YARN-479 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: jian he Attachments: YARN-479.1.patch, YARN-479.2.patch Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client
[ https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608642#comment-13608642 ] Hadoop QA commented on YARN-378: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574721/YARN-378_MAPREDUCE-5062.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/554//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/554//console This message is automatically generated. ApplicationMaster retry times should be set by Client - Key: YARN-378 URL: https://issues.apache.org/jira/browse/YARN-378 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Environment: suse Reporter: xieguiming Assignee: Zhijie Shen Labels: usability Attachments: YARN-378_10.patch, YARN-378_1.patch, YARN-378_2.patch, YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch, YARN-378_9.patch, YARN-378_MAPREDUCE-5062.patch We should support that different client or user have different ApplicationMaster retry times. It also say that yarn.resourcemanager.am.max-retries should be set by client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira