date:20130320


[ 
https://issues.apache.org/jira/browse/YARN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607496#comment-13607496
 ] 

Hudson commented on YARN-200:
-

Integrated in Hadoop-Yarn-trunk #161 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/161/])
YARN-200. yarn log does not output all needed information, and is in a 
binary format. Contributed by Ravi Prakash (Revision 1458466)

 Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458466
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogDumper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 yarn log does not output all needed information, and is in a binary format
 --

 Key: YARN-200
 URL: https://issues.apache.org/jira/browse/YARN-200
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Ravi Prakash
  Labels: usability
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: YARN-200.patch, YARN-200.patch


 yarn logs does not output attemptid, nodename, or container-id.  Missing 
 these makes it very difficult to look through the logs for failed containers 
 and tie them back to actual tasks and task attempts.
 Also the output currently includes several binary characters.  This is OK for 
 being machine readable, but difficult for being human readable, or even for 
 using standard tool like grep.
 The help message can also be more useful to users

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-200) yarn log does not output all needed information, and is in a binary format


[ 
https://issues.apache.org/jira/browse/YARN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607564#comment-13607564
 ] 

Hudson commented on YARN-200:
-

Integrated in Hadoop-Hdfs-0.23-Build #559 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/559/])
svn merge -c 1458466 FIXES: YARN-200. yarn log does not output all needed 
information, and is in a binary format. Contributed by Ravi Prakash (Revision 
1458474)

 Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458474
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogDumper.java
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 yarn log does not output all needed information, and is in a binary format
 --

 Key: YARN-200
 URL: https://issues.apache.org/jira/browse/YARN-200
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Ravi Prakash
  Labels: usability
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: YARN-200.patch, YARN-200.patch


 yarn logs does not output attemptid, nodename, or container-id.  Missing 
 these makes it very difficult to look through the logs for failed containers 
 and tie them back to actual tasks and task attempts.
 Also the output currently includes several binary characters.  This is OK for 
 being machine readable, but difficult for being human readable, or even for 
 using standard tool like grep.
 The help message can also be more useful to users

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-200) yarn log does not output all needed information, and is in a binary format


[ 
https://issues.apache.org/jira/browse/YARN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607571#comment-13607571
 ] 

Hudson commented on YARN-200:
-

Integrated in Hadoop-Hdfs-trunk #1350 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1350/])
YARN-200. yarn log does not output all needed information, and is in a 
binary format. Contributed by Ravi Prakash (Revision 1458466)

 Result = FAILURE
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458466
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogDumper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 yarn log does not output all needed information, and is in a binary format
 --

 Key: YARN-200
 URL: https://issues.apache.org/jira/browse/YARN-200
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Ravi Prakash
  Labels: usability
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: YARN-200.patch, YARN-200.patch


 yarn logs does not output attemptid, nodename, or container-id.  Missing 
 these makes it very difficult to look through the logs for failed containers 
 and tie them back to actual tasks and task attempts.
 Also the output currently includes several binary characters.  This is OK for 
 being machine readable, but difficult for being human readable, or even for 
 using standard tool like grep.
 The help message can also be more useful to users

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-200) yarn log does not output all needed information, and is in a binary format


[ 
https://issues.apache.org/jira/browse/YARN-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607627#comment-13607627
 ] 

Hudson commented on YARN-200:
-

Integrated in Hadoop-Mapreduce-trunk #1378 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1378/])
YARN-200. yarn log does not output all needed information, and is in a 
binary format. Contributed by Ravi Prakash (Revision 1458466)

 Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1458466
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogDumper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 yarn log does not output all needed information, and is in a binary format
 --

 Key: YARN-200
 URL: https://issues.apache.org/jira/browse/YARN-200
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Ravi Prakash
  Labels: usability
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: YARN-200.patch, YARN-200.patch


 yarn logs does not output attemptid, nodename, or container-id.  Missing 
 these makes it very difficult to look through the logs for failed containers 
 and tie them back to actual tasks and task attempts.
 Also the output currently includes several binary characters.  This is OK for 
 being machine readable, but difficult for being human readable, or even for 
 using standard tool like grep.
 The help message can also be more useful to users

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-492) Too many open files error to launch a container


[ 
https://issues.apache.org/jira/browse/YARN-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607812#comment-13607812
 ] 

Hitesh Shah commented on YARN-492:
--

Please add details of which processes are using the ports in question. In 
addition to that, what configuration value was set to make use of port 50010 
and/or 44871?



 Too many open files error to launch a container
 ---

 Key: YARN-492
 URL: https://issues.apache.org/jira/browse/YARN-492
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
 Environment: RedHat Linux
Reporter: Krishna Kishore Bonagiri

 I am running a date command with YARN's distributed shell example in a loop 
 of 1000 times in this way:
 yarn jar 
 /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
  org.apache.hadoop.yarn.applications.distributedshell.Client --jar 
 /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
  --shell_command date --num_containers 2
 Around 730th time or so, I am getting an error in node manager's log saying 
 that it failed to launch container because there are Too many open files 
 and when I observe through lsof command,I find that there is one instance of 
 this kind of file is left for each run of Application Master, and it kept 
 growing as I am running it in loop.
 node1:44871-node1:50010
 Thanks,
 Kishore

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-492) Too many open files error to launch a container


[ 
https://issues.apache.org/jira/browse/YARN-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607819#comment-13607819
 ] 

Hitesh Shah commented on YARN-492:
--

Nevermind, 50010 is the default datanode port. What process is opening up 
44871? If it is the node manager, do you have log aggregation enabled? Could 
you try running the test with log aggregation disabled and let us know if the 
problem is still reproducible? 

 Too many open files error to launch a container
 ---

 Key: YARN-492
 URL: https://issues.apache.org/jira/browse/YARN-492
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
 Environment: RedHat Linux
Reporter: Krishna Kishore Bonagiri

 I am running a date command with YARN's distributed shell example in a loop 
 of 1000 times in this way:
 yarn jar 
 /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
  org.apache.hadoop.yarn.applications.distributedshell.Client --jar 
 /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
  --shell_command date --num_containers 2
 Around 730th time or so, I am getting an error in node manager's log saying 
 that it failed to launch container because there are Too many open files 
 and when I observe through lsof command,I find that there is one instance of 
 this kind of file is left for each run of Application Master, and it kept 
 growing as I am running it in loop.
 node1:44871-node1:50010
 Thanks,
 Kishore

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-112) Race in localization can cause containers to fail

2013-03-20 Thread omkar vinit joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

omkar vinit joshi reassigned YARN-112:
--

Assignee: omkar vinit joshi

 Race in localization can cause containers to fail
 -

 Key: YARN-112
 URL: https://issues.apache.org/jira/browse/YARN-112
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: omkar vinit joshi

 On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
 two map tasks of a MR job, that were launched almost simultaneously on the 
 same node.  It appears they both tried to localize job.jar and job.xml at the 
 same time.  One of the containers failed when it couldn't rename the 
 temporary job.jar directory to its final name because the target directory 
 wasn't empty.  Shortly afterwards the second container failed because job.xml 
 could not be found, presumably because the first container removed it when it 
 cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-20 Thread jian he (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he updated YARN-479:
-

Attachment: YARN-479.1.patch

 NM retry behavior for connection to RM should be similar for lost heartbeats
 

 Key: YARN-479
 URL: https://issues.apache.org/jira/browse/YARN-479
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: jian he
 Attachments: YARN-479.1.patch


 Regardless of connection loss at the start or at an intermediate point, NM's 
 retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats


[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607910#comment-13607910
 ] 

Hadoop QA commented on YARN-479:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12574579/YARN-479.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/549//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/549//console

This message is automatically generated.

 NM retry behavior for connection to RM should be similar for lost heartbeats
 

 Key: YARN-479
 URL: https://issues.apache.org/jira/browse/YARN-479
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: jian he
 Attachments: YARN-479.1.patch


 Regardless of connection loss at the start or at an intermediate point, NM's 
 retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-490) TestDistributedShell fails on Windows

2013-03-20 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated YARN-490:
---

Labels: windows  (was: )

 TestDistributedShell fails on Windows
 -

 Key: YARN-490
 URL: https://issues.apache.org/jira/browse/YARN-490
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
  Labels: windows
 Attachments: YARN-490.1.patch


 There are a few platform-specific assumption in distributed shell (both main 
 code and test code) that prevent it from working correctly on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-472) MR app master deletes staging dir when sent a reboot command from the RM

2013-03-20 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-472:


Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-128)

 MR app master deletes staging dir when sent a reboot command from the RM
 

 Key: YARN-472
 URL: https://issues.apache.org/jira/browse/YARN-472
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: jian he
Assignee: jian he
 Attachments: YARN-472.1.patch, YARN-472.2.patch


 If the RM is restarted when the MR job is running, then it sends a reboot 
 command to the job. The job ends up deleting the staging dir and that causes 
 the next attempt to fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers


[ 
https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608004#comment-13608004
 ] 

Hadoop QA commented on YARN-417:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12574596/YARN-417-5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/550//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/550//console

This message is automatically generated.

 Add a poller that allows the AM to receive notifications when it is assigned 
 containers
 ---

 Key: YARN-417
 URL: https://issues.apache.org/jira/browse/YARN-417
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, applications
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, 
 YARN-417-1.patch, YARN-417-2.patch, YARN-417-3.patch, YARN-417-4.patch, 
 YARN-417-4.patch, YARN-417-5.patch, YARN-417.patch, YarnAppMaster.java, 
 YarnAppMasterListener.java


 Writing AMs would be easier for some if they did not have to handle 
 heartbeating to the RM on their own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-493) TestContainerManager fails on Windows

2013-03-20 Thread Chris Nauroth (JIRA)

Chris Nauroth created YARN-493:
--

 Summary: TestContainerManager fails on Windows
 Key: YARN-493
 URL: https://issues.apache.org/jira/browse/YARN-493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0


The tests contain some platform-specific assumptions, such as availability of 
bash for executing a command in a container and signals to check existence of a 
process and terminate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-487) TestDiskFailures fails on Windows due to path mishandling

2013-03-20 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608061#comment-13608061
 ] 

Arpit Agarwal commented on YARN-487:


+1

Verified on Windows and OS X.

 TestDiskFailures fails on Windows due to path mishandling
 -

 Key: YARN-487
 URL: https://issues.apache.org/jira/browse/YARN-487
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: YARN-487.1.patch


 {{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an 
 extra leading '/' on the path within {{LocalDirsHandlerService}} when running 
 on Windows.  The test assertions also fail to account for the fact that 
 {{Path}} normalizes '\' to '/'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-491) TestContainerLogsPage fails on Windows

2013-03-20 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608079#comment-13608079
 ] 

Arpit Agarwal commented on YARN-491:


+1

Verified on Windows and OS X. Thanks for all the YARN fixes!

 TestContainerLogsPage fails on Windows
 --

 Key: YARN-491
 URL: https://issues.apache.org/jira/browse/YARN-491
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: YARN-491.1.patch


 {{TestContainerLogsPage}} contains some code for initializing a log directory 
 that doesn't work correctly on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-396) Rationalize AllocateResponse in RM scheduler API


 [ 
https://issues.apache.org/jira/browse/YARN-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-396:
-

Hadoop Flags: Incompatible change,Reviewed

 Rationalize AllocateResponse in RM scheduler API
 

 Key: YARN-396
 URL: https://issues.apache.org/jira/browse/YARN-396
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
  Labels: incompatible
 Attachments: YARN-396_1.patch, YARN-396_2.patch, YARN-396_3.patch, 
 YARN-396_4.patch, YARN-396_5.patch


 AllocateResponse contains an AMResponse and cluster node count. AMResponse 
 that more data. Unless there is a good reason for this object structure, 
 there should be either AMResponse or AllocateResponse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-396) Rationalize AllocateResponse in RM scheduler API


[ 
https://issues.apache.org/jira/browse/YARN-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608139#comment-13608139
 ] 

Hitesh Shah commented on YARN-396:
--

Changes look good. Will commit shortly to trunk and branch-2. 

 Rationalize AllocateResponse in RM scheduler API
 

 Key: YARN-396
 URL: https://issues.apache.org/jira/browse/YARN-396
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
  Labels: incompatible
 Attachments: YARN-396_1.patch, YARN-396_2.patch, YARN-396_3.patch, 
 YARN-396_4.patch, YARN-396_5.patch


 AllocateResponse contains an AMResponse and cluster node count. AMResponse 
 that more data. Unless there is a good reason for this object structure, 
 there should be either AMResponse or AllocateResponse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-396) Rationalize AllocateResponse in RM scheduler API


[ 
https://issues.apache.org/jira/browse/YARN-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608190#comment-13608190
 ] 

Hudson commented on YARN-396:
-

Integrated in Hadoop-trunk-Commit #3497 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3497/])
YARN-396. Rationalize AllocateResponse in RM Scheduler API. Contributed by 
Zhijie Shen. (Revision 1459040)

 Result = SUCCESS
hitesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1459040
Files : 
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/local/LocalContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerRequestor.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRAppBenchmark.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetAllApplicationsResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetClusterNodesResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetQueueUserAclsInfoResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/AMResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/AMResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/AMRMClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestAMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRecordFactory.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCResponseId.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestApplicationTokens.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java
*

[jira] [Commented] (YARN-297) Improve hashCode implementations for PB records


[ 
https://issues.apache.org/jira/browse/YARN-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608204#comment-13608204
 ] 

Hitesh Shah commented on YARN-297:
--

+1. Will commit shortly. 

 Improve hashCode implementations for PB records
 ---

 Key: YARN-297
 URL: https://issues.apache.org/jira/browse/YARN-297
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Xuan Gong
 Attachments: YARN.297.1.patch, YARN-297.2.patch


 As [~hsn] pointed out in YARN-2, we use very small primes in all our hashCode 
 implementations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-379) yarn [node,application] command print logger info messages

2013-03-20 Thread Ravi Prakash (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608213#comment-13608213
 ] 

Ravi Prakash commented on YARN-379:
---

Can we not simply add 
log4j.category.org.apache.hadoop.yarn.service.AbstractService=WARN
to the log4j.properties file? In my testing, this prevented INFO messages on 
the console but not in the log file for the AM (which I don't completely 
understand how that were possible). This is obviously dependent on my 
log4j.properties file and I believe that is where it should be handled


 yarn [node,application] command print logger info messages
 --

 Key: YARN-379
 URL: https://issues.apache.org/jira/browse/YARN-379
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Abhishek Kapoor
  Labels: usability
 Attachments: YARN-379.patch


 Running the yarn node and yarn applications command results in annoying log 
 info messages being printed:
 $ yarn node -list
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Total Nodes:1
  Node-IdNode-State  Node-Http-Address   
 Health-Status(isNodeHealthy)Running-Containers
 foo:8041RUNNING  foo:8042   true  
  0
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.
 $ yarn application
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Invalid Command Usage : 
 usage: application
  -kill arg Kills the application.
  -list   Lists all the Applications from RM.
  -status arg   Prints the status of the application.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-470) Support a way to disable resource monitoring on the NodeManager


[ 
https://issues.apache.org/jira/browse/YARN-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608248#comment-13608248
 ] 

Hitesh Shah commented on YARN-470:
--

Comments: 
   - missing changes in yarn-default.xml 
   - question regarding NodeInfo which reports totalPMem and totalVMem - is the 
expectation that it should return the actual configured value or -1 if memory 
checks are disabled? The node information sent to the RM in the heartbeat is 
the actual amount whereas the NM UI seems to be displaying something else. Does 
it makes sense to add the memory monitoring flags as separate bits of 
information? 
  - this could allow a function like isPhysicalMemoryCheckEnabled to just 
return the flag instead of overloading the totalPMemValue when monitoring is 
disabled. 
   - is this needed: s/YarnConfiguration.DEFAULT_NM_PMEM_MB) * 1024 * 
1024/YarnConfiguration.DEFAULT_NM_PMEM_MB) * 1024 * 1024L/ ? ( missing long 
qualifier on the last 1024 ) ? 
 


 

 Support a way to disable resource monitoring on the NodeManager
 ---

 Key: YARN-470
 URL: https://issues.apache.org/jira/browse/YARN-470
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Siddharth Seth
  Labels: usability
 Attachments: YARN-470.txt


 Currently, the memory management monitor's check is disabled when the maxMem 
 is set to -1. However, the maxMem is also sent to the RM when the NM 
 registers with it ( to define the max limit of allocate-able resources ). 
 We need an explicit flag to disable monitoring to avoid the problems caused 
 by the overloading of the max memory value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers

2013-03-20 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608279#comment-13608279
 ] 

Bikas Saha commented on YARN-417:
-

IMO, the locking intent will be more clear if we set keepRunning inside the 
lock because essentially that is also a shared value that we are guarding. The 
client.allocate() and client.unregister() are themselves already synchronized 
on the inner rmClient.
{code}
+  public void unregisterApplicationMaster(FinalApplicationStatus appStatus,
+  String appMessage, String appTrackingUrl) throws YarnRemoteException {
+keepRunning = false;
+synchronized (client) {
+  client.unregisterApplicationMaster(appStatus, appMessage, 
appTrackingUrl);
+}
{code}

I guess now the outer while loop can actually become a while(true) with the 
inner check for if(keepRunning) causing a break when it fails. I like this 
pattern because then, when I read the code, I clearly see that the outer loop 
is purely a run-to-infinity loop and I dont have to keep that condition in mind 
when I try to grok the inner if condition that actually controls the loop 
action. What do you think?
{code}
+  while (keepRunning) {
+try {
+  AllocateResponse response;
+  synchronized (client) {
+// ensure we don't send heartbeats after unregistering
+if (keepRunning) {
+  response = client.allocate(progress);
{code}

Your comments on usage of the async client dont mention anything about the 
callbacks in the exemplary code flow (which is essentially the new changes in 
this jira) :)

The patch needs to be rebased because YARN-396 went in that merged AMResponse 
into AllocateResponse.

 Add a poller that allows the AM to receive notifications when it is assigned 
 containers
 ---

 Key: YARN-417
 URL: https://issues.apache.org/jira/browse/YARN-417
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, applications
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, 
 YARN-417-1.patch, YARN-417-2.patch, YARN-417-3.patch, YARN-417-4.patch, 
 YARN-417-4.patch, YARN-417-5.patch, YARN-417.patch, YarnAppMaster.java, 
 YarnAppMasterListener.java


 Writing AMs would be easier for some if they did not have to handle 
 heartbeating to the RM on their own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

2013-03-20 Thread omkar vinit joshi (JIRA)

[
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608333#comment-13608333
]

omkar vinit joshi commented on YARN-112:

This problem is occurring mainly because createDir call on FileContext is not
throwing exception in case the file system is RawLocalFileSystem. So if the
directory is already present then new createDir will silently return instead of
throwing exception. This is causing the race condition to occur in case two
containers try to localize at the same time and get same random number. However
rename call is an atomic call and to avoid the race condition we should use it.

Earlier implementation
1) generate random num (r1)
2) check if the r1 is present.. if present go to 1 else 2
3) create directories r1 and r1_tmp
4) copy the files into r1_tmp
5) rename r1_tmp to r1 ( This is an atomic call and only one thread will
succeed. Rest of them will fail. Error listed is just one of the errors which
might be logged).

Suggested Fix
1) generate random num (r1)
2) check if r1 is present if present go to 1) else 3)
3) create dir r1
4) rename r1 to r1_tmp (only one will succeed .. rest of the threads will get
an exception and will continue to 1)
5) check if there exists file inside r1_tmp if present rename it back to r1 and
go to 1) else go to 6 ( This check is added because if we get threads with same
random number and passes check 2.. then one thread completely finishes download
in which case it will rename r1_tmp back to r1... so for the other thread which
now comes into picture rename call ( r1 to r1_tmp ) will succeed. However this
should be avoided. This we can avoid by checking the contents of r1_tmp).
6) create r1
7) continue with actual file download.
8) rename r1_tmp to r1.

Race in localization can cause containers to fail
-

Key: YARN-112
URL: https://issues.apache.org/jira/browse/YARN-112
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: omkar vinit joshi

On one of our 0.23 clusters, I saw a case of two containers, corresponding to
two map tasks of a MR job, that were launched almost simultaneously on the
same node. It appears they both tried to localize job.jar and job.xml at the
same time. One of the containers failed when it couldn't rename the
temporary job.jar directory to its final name because the target directory
wasn't empty. Shortly afterwards the second container failed because job.xml
could not be found, presumably because the first container removed it when it
cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-109) .tmp file is not deleted for localized archives

2013-03-20 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608378#comment-13608378
 ] 

Mayank Bansal commented on YARN-109:


[~ojoshi]
Good points.
Adding another patch.

Thanks,
Mayank

 .tmp file is not deleted for localized archives
 ---

 Key: YARN-109
 URL: https://issues.apache.org/jira/browse/YARN-109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
Assignee: Mayank Bansal
 Attachments: YARN-109-trunk-1.patch, YARN-109-trunk-2.patch, 
 YARN-109-trunk.patch


 When archives are localized they are initially created as a .tmp file and 
 unpacked from that file.  However the .tmp file is not deleted afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-109) .tmp file is not deleted for localized archives

2013-03-20 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-109:
---

Attachment: YARN-109-trunk-2.patch

 .tmp file is not deleted for localized archives
 ---

 Key: YARN-109
 URL: https://issues.apache.org/jira/browse/YARN-109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
Assignee: Mayank Bansal
 Attachments: YARN-109-trunk-1.patch, YARN-109-trunk-2.patch, 
 YARN-109-trunk.patch


 When archives are localized they are initially created as a .tmp file and 
 unpacked from that file.  However the .tmp file is not deleted afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-488) TestContainerManagerSecurity fails on Windows


[ 
https://issues.apache.org/jira/browse/YARN-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608438#comment-13608438
 ] 

Hitesh Shah commented on YARN-488:
--

{code}
if (inputClassPath != null)
{code}

Does it make sense to change to:

{code}
if (inputClassPath != null  !inputClassPath.isEmpty())
{code}


 TestContainerManagerSecurity fails on Windows
 -

 Key: YARN-488
 URL: https://issues.apache.org/jira/browse/YARN-488
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: YARN-488.1.patch


 These tests are failing to launch containers correctly when running on 
 Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-470) Support a way to disable resource monitoring on the NodeManager


[ 
https://issues.apache.org/jira/browse/YARN-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608449#comment-13608449
 ] 

Hadoop QA commented on YARN-470:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12574683/YARN-470_2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/552//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/552//console

This message is automatically generated.

 Support a way to disable resource monitoring on the NodeManager
 ---

 Key: YARN-470
 URL: https://issues.apache.org/jira/browse/YARN-470
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Siddharth Seth
  Labels: usability
 Attachments: YARN-470_2.txt, YARN-470.txt


 Currently, the memory management monitor's check is disabled when the maxMem 
 is set to -1. However, the maxMem is also sent to the RM when the NM 
 registers with it ( to define the max limit of allocate-able resources ). 
 We need an explicit flag to disable monitoring to avoid the problems caused 
 by the overloading of the max memory value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers

2013-03-20 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-417:


Attachment: YARN-417-6.patch

Updated patch makes the changes suggested in Bikas's comment, including the 
rebase.  For the while loop, I moved things around a little in a way that seems 
more clear to me.

 Add a poller that allows the AM to receive notifications when it is assigned 
 containers
 ---

 Key: YARN-417
 URL: https://issues.apache.org/jira/browse/YARN-417
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, applications
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, 
 YARN-417-1.patch, YARN-417-2.patch, YARN-417-3.patch, YARN-417-4.patch, 
 YARN-417-4.patch, YARN-417-5.patch, YARN-417-6.patch, YARN-417.patch, 
 YarnAppMaster.java, YarnAppMasterListener.java


 Writing AMs would be easier for some if they did not have to handle 
 heartbeating to the RM on their own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable

2013-03-20 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608522#comment-13608522
 ] 

Sandy Ryza commented on YARN-24:


I verified this on a pseudo-distributed cluster in the following way:
* Started up yarn expecting a namenode port of 7654.
* Started up HDFS with default namenode port of 9000.
* Ran a pi job.
* Verified that log aggregation failed because the nodemanager couldn't find 
the namenode.
* Restarted HDFS with the namenode port 7654.
* Ran another YARN job.
* Verified that the logs from the second job showed up in the UI and that the 
logs from the first job didn't.

 Nodemanager fails to start if log aggregation enabled and namenode unavailable
 --

 Key: YARN-24
 URL: https://issues.apache.org/jira/browse/YARN-24
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
Assignee: Sandy Ryza
 Attachments: YARN-24-1.patch, YARN-24-2.patch, YARN-24.patch


 If log aggregation is enabled and the namenode is currently unavailable, the 
 nodemanager fails to startup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-99) Jobs fail during resource localization when directories in file cache reaches to unix directory limit

2013-03-20 Thread omkar vinit joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

omkar vinit joshi reassigned YARN-99:
-

Assignee: omkar vinit joshi  (was: Devaraj K)

 Jobs fail during resource localization when directories in file cache reaches 
 to unix directory limit
 -

 Key: YARN-99
 URL: https://issues.apache.org/jira/browse/YARN-99
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Devaraj K
Assignee: omkar vinit joshi

 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache. The jobs start failing with the 
 below exception.
 {code:xml}
 java.io.IOException: mkdir of 
 /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 We should have a mechanism to clean the cache files if it crosses specified 
 number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-20 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-378:
-

Attachment: YARN-378_10.patch

Clean up some whitespace characters.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_10.patch, YARN-378_1.patch, YARN-378_2.patch, 
 YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, 
 YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch, YARN-378_9.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-488) TestContainerManagerSecurity fails on Windows

2013-03-20 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-488:
---

Attachment: YARN-488.2.patch

Thanks, Hitesh.  AFAIK, there is no significant difference between no classpath 
and an empty classpath, so I do think it's correct to change the condition to 
check for empty string.  Here is a new patch that does that.

 TestContainerManagerSecurity fails on Windows
 -

 Key: YARN-488
 URL: https://issues.apache.org/jira/browse/YARN-488
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: YARN-488.1.patch, YARN-488.2.patch


 These tests are failing to launch containers correctly when running on 
 Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-20 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-378:
-

Attachment: YARN-378_MAPREDUCE-5062.patch

Combine the latest patches of YARN-378 and MAPREDUCE-5062 to allow Jenkins to 
run and verify them.

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability
 Attachments: YARN-378_10.patch, YARN-378_1.patch, YARN-378_2.patch, 
 YARN-378_3.patch, YARN-378_4.patch, YARN-378_5.patch, YARN-378_6.patch, 
 YARN-378_6.patch, YARN-378_7.patch, YARN-378_8.patch, YARN-378_9.patch, 
 YARN-378_MAPREDUCE-5062.patch


 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-488) TestContainerManagerSecurity fails on Windows


[ 
https://issues.apache.org/jira/browse/YARN-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608615#comment-13608615
 ] 

Hadoop QA commented on YARN-488:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12574718/YARN-488.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/553//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/553//console

This message is automatically generated.

 TestContainerManagerSecurity fails on Windows
 -

 Key: YARN-488
 URL: https://issues.apache.org/jira/browse/YARN-488
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: YARN-488.1.patch, YARN-488.2.patch


 These tests are failing to launch containers correctly when running on 
 Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-20 Thread Xuan Gong (JIRA)

[
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608632#comment-13608632
]

Xuan Gong commented on YARN-479:

Couple of comments on the latest one (479-2):
1. In the while(true) loop at NodeStatusUpdaterImpl : startStatusUpdater() :
rmRetryCount ++ and response =
resourceTracker.nodeHeartbeat(request).getHeartbeatResponse() can be in the try
block, others such as NodeStatus nodeStatus = getNodeStatus(), etc, I think we
can move them out of while(true) loop. We only consider losting
heartBeatResponse.
2.please re-phrase the warning message and error message for more clarity -
something along the lines of did not get the heartbeat response ...
3. testNMRegistration may not be a good place to test the changes. You can
re-write your own ResourceTracker and NodeStatusUpdater to mimic the heartbeat
response lose, and test your code if it can handle properly. Take a look at the
MyNodeStatusUpdater and MyResourceTracker class, they can tell you how to do
that.

NM retry behavior for connection to RM should be similar for lost heartbeats

Key: YARN-479
URL: https://issues.apache.org/jira/browse/YARN-479
Project: Hadoop YARN
Issue Type: Bug
Reporter: Hitesh Shah
Assignee: jian he
Attachments: YARN-479.1.patch, YARN-479.2.patch

Regardless of connection loss at the start or at an intermediate point, NM's
retry behavior to the RM should follow the same flow.

[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-03-20 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608634#comment-13608634
 ] 

Xuan Gong commented on YARN-479:


minor question : why add Assert response != null ? Trying to test 
post-condition here ? 
If response == null, what will happen ? I mean, if response == null, the 
following code response.getNodeAction() will give error anyway. 

 NM retry behavior for connection to RM should be similar for lost heartbeats
 

 Key: YARN-479
 URL: https://issues.apache.org/jira/browse/YARN-479
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: jian he
 Attachments: YARN-479.1.patch, YARN-479.2.patch


 Regardless of connection loss at the start or at an intermediate point, NM's 
 retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client