[jira] [Updated] (YARN-112) Race in localization can cause containers to fail

2013-04-09 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-112:
---

Attachment: yarn-112-20130409.patch

 Race in localization can cause containers to fail
 -

 Key: YARN-112
 URL: https://issues.apache.org/jira/browse/YARN-112
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Fix For: 2.0.5-beta

 Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, 
 yarn-112-20130326.patch, yarn-112-20130408.1.patch, yarn-112-20130408.patch, 
 yarn-112-20130409.patch, yarn-112.20131503.patch


 On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
 two map tasks of a MR job, that were launched almost simultaneously on the 
 same node.  It appears they both tried to localize job.jar and job.xml at the 
 same time.  One of the containers failed when it couldn't rename the 
 temporary job.jar directory to its final name because the target directory 
 wasn't empty.  Shortly afterwards the second container failed because job.xml 
 could not be found, presumably because the first container removed it when it 
 cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

2013-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626412#comment-13626412
 ] 

Hadoop QA commented on YARN-112:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12577749/yarn-112-20130409.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/693//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/693//console

This message is automatically generated.

 Race in localization can cause containers to fail
 -

 Key: YARN-112
 URL: https://issues.apache.org/jira/browse/YARN-112
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Fix For: 2.0.5-beta

 Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, 
 yarn-112-20130326.patch, yarn-112-20130408.1.patch, yarn-112-20130408.patch, 
 yarn-112-20130409.patch, yarn-112.20131503.patch


 On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
 two map tasks of a MR job, that were launched almost simultaneously on the 
 same node.  It appears they both tried to localize job.jar and job.xml at the 
 same time.  One of the containers failed when it couldn't rename the 
 temporary job.jar directory to its final name because the target directory 
 wasn't empty.  Shortly afterwards the second container failed because job.xml 
 could not be found, presumably because the first container removed it when it 
 cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626466#comment-13626466
 ] 

Hudson commented on YARN-524:
-

Integrated in Hadoop-Yarn-trunk #178 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/178/])
YARN-524. Moving CHANGES.txt entry to the correct section. (Revision 
1465876)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465876
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
 --

 Key: YARN-524
 URL: https://issues.apache.org/jira/browse/YARN-524
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.0.0
 Environment: OS/X with branch off github
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 3.0.0

 Attachments: YARN-524.patch


 {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call 
 returns {{Unknown}} when that is the value inserted into the property file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626468#comment-13626468
 ] 

Hudson commented on YARN-479:
-

Integrated in Hadoop-Yarn-trunk #178 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/178/])
YARN-479. NM retry behavior for connection to RM should be similar for lost 
heartbeats (Jian He via bikas) (Revision 1465731)

 Result = SUCCESS
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465731
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 NM retry behavior for connection to RM should be similar for lost heartbeats
 

 Key: YARN-479
 URL: https://issues.apache.org/jira/browse/YARN-479
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
 YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
 YARN-479.8.patch, YARN-479.9.patch


 Regardless of connection loss at the start or at an intermediate point, NM's 
 retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626469#comment-13626469
 ] 

Hudson commented on YARN-99:


Integrated in Hadoop-Yarn-trunk #178 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/178/])
YARN-99. Modify private distributed cache to localize files such that no 
local directory hits unix file count limits and thus prevent job failures. 
Contributed by Omkar Vinit Joshi. (Revision 1465853)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465853
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/ResourceLocalizationSpec.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/impl/pb/ResourceLocalizationSpecPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/LocalizerHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/LocalizerHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerBuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/TestPBRecordImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/MockLocalizerHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


 Jobs fail during resource localization when private distributed-cache hits 
 unix directory limits
 

 Key: YARN-99
 URL: https://issues.apache.org/jira/browse/YARN-99
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Omkar Vinit Joshi
 Fix For: 2.0.5-beta

 Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, 
 yarn-99-20130403.patch, yarn-99-20130408.1.patch, yarn-99-20130408.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache. The jobs start failing with the 
 below exception.
 {code:xml}
 java.io.IOException: mkdir of 
 /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   

[jira] [Commented] (YARN-557) TestUnmanagedAMLauncher fails on Windows

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626555#comment-13626555
 ] 

Hudson commented on YARN-557:
-

Integrated in Hadoop-Hdfs-trunk #1367 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1367/])
YARN-557. Fix TestUnmanagedAMLauncher failure on Windows. Contributed by 
Chris Nauroth. (Revision 1465869)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465869
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java


 TestUnmanagedAMLauncher fails on Windows
 

 Key: YARN-557
 URL: https://issues.apache.org/jira/browse/YARN-557
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0

 Attachments: YARN-557.1.patch


 {{TestUnmanagedAMLauncher}} fails on Windows due to attempting to run a 
 Unix-specific command in distributed shell and use of a Unix-specific 
 environment variable to determine username for the {{ContainerLaunchContext}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626559#comment-13626559
 ] 

Hudson commented on YARN-479:
-

Integrated in Hadoop-Hdfs-trunk #1367 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1367/])
YARN-479. NM retry behavior for connection to RM should be similar for lost 
heartbeats (Jian He via bikas) (Revision 1465731)

 Result = FAILURE
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465731
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 NM retry behavior for connection to RM should be similar for lost heartbeats
 

 Key: YARN-479
 URL: https://issues.apache.org/jira/browse/YARN-479
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
 YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
 YARN-479.8.patch, YARN-479.9.patch


 Regardless of connection loss at the start or at an intermediate point, NM's 
 retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626560#comment-13626560
 ] 

Hudson commented on YARN-99:


Integrated in Hadoop-Hdfs-trunk #1367 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1367/])
YARN-99. Modify private distributed cache to localize files such that no 
local directory hits unix file count limits and thus prevent job failures. 
Contributed by Omkar Vinit Joshi. (Revision 1465853)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465853
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/ResourceLocalizationSpec.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/impl/pb/ResourceLocalizationSpecPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/LocalizerHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/LocalizerHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerBuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/TestPBRecordImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/MockLocalizerHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


 Jobs fail during resource localization when private distributed-cache hits 
 unix directory limits
 

 Key: YARN-99
 URL: https://issues.apache.org/jira/browse/YARN-99
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Omkar Vinit Joshi
 Fix For: 2.0.5-beta

 Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, 
 yarn-99-20130403.patch, yarn-99-20130408.1.patch, yarn-99-20130408.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache. The jobs start failing with the 
 below exception.
 {code:xml}
 java.io.IOException: mkdir of 
 /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   

[jira] [Commented] (YARN-557) TestUnmanagedAMLauncher fails on Windows

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626579#comment-13626579
 ] 

Hudson commented on YARN-557:
-

Integrated in Hadoop-Mapreduce-trunk #1394 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1394/])
YARN-557. Fix TestUnmanagedAMLauncher failure on Windows. Contributed by 
Chris Nauroth. (Revision 1465869)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465869
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java


 TestUnmanagedAMLauncher fails on Windows
 

 Key: YARN-557
 URL: https://issues.apache.org/jira/browse/YARN-557
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 3.0.0

 Attachments: YARN-557.1.patch


 {{TestUnmanagedAMLauncher}} fails on Windows due to attempting to run a 
 Unix-specific command in distributed shell and use of a Unix-specific 
 environment variable to determine username for the {{ContainerLaunchContext}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626581#comment-13626581
 ] 

Hudson commented on YARN-524:
-

Integrated in Hadoop-Mapreduce-trunk #1394 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1394/])
YARN-524. Moving CHANGES.txt entry to the correct section. (Revision 
1465876)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465876
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
 --

 Key: YARN-524
 URL: https://issues.apache.org/jira/browse/YARN-524
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.0.0
 Environment: OS/X with branch off github
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 3.0.0

 Attachments: YARN-524.patch


 {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call 
 returns {{Unknown}} when that is the value inserted into the property file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626583#comment-13626583
 ] 

Hudson commented on YARN-479:
-

Integrated in Hadoop-Mapreduce-trunk #1394 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1394/])
YARN-479. NM retry behavior for connection to RM should be similar for lost 
heartbeats (Jian He via bikas) (Revision 1465731)

 Result = FAILURE
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465731
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 NM retry behavior for connection to RM should be similar for lost heartbeats
 

 Key: YARN-479
 URL: https://issues.apache.org/jira/browse/YARN-479
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
 YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, 
 YARN-479.8.patch, YARN-479.9.patch


 Regardless of connection loss at the start or at an intermediate point, NM's 
 retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626584#comment-13626584
 ] 

Hudson commented on YARN-99:


Integrated in Hadoop-Mapreduce-trunk #1394 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1394/])
YARN-99. Modify private distributed cache to localize files such that no 
local directory hits unix file count limits and thus prevent job failures. 
Contributed by Omkar Vinit Joshi. (Revision 1465853)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465853
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/ResourceLocalizationSpec.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/impl/pb/ResourceLocalizationSpecPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/LocalizerHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/LocalizerHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerBuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/TestPBRecordImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/MockLocalizerHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


 Jobs fail during resource localization when private distributed-cache hits 
 unix directory limits
 

 Key: YARN-99
 URL: https://issues.apache.org/jira/browse/YARN-99
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Omkar Vinit Joshi
 Fix For: 2.0.5-beta

 Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, 
 yarn-99-20130403.patch, yarn-99-20130408.1.patch, yarn-99-20130408.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache. The jobs start failing with the 
 below exception.
 {code:xml}
 java.io.IOException: mkdir of 
 /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 

[jira] [Resolved] (YARN-122) CompositeService should clone the Configurations it passes to children

2013-04-09 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-122.
-

Resolution: Invalid

Resolving as INVALID as some composite services (e.g. {{MiniYARNCluster}} use 
shared config object as way of sharing information -such as allocated ports.

 CompositeService should clone the Configurations it passes to children
 --

 Key: YARN-122
 URL: https://issues.apache.org/jira/browse/YARN-122
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Priority: Minor
   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 {{CompositeService.init(Configuration)}} saves the configuration passed in 
 *and* passes the same instance down to all managed services. This means a 
 change in the configuration of one child could propagate to all the others.
 Unless this is desired, the configuration should be cloned for each child.
 Fast and easy fix; tests can be added to those coming in MAPREDUCE-4014

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs

2013-04-09 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-535:


Attachment: YARN-535.patch

this patch verifies that the conf to write has picked up the hostnames, then 
saves the config XML to a string *before* trying to write it to the classpath'd 
resource

 TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during 
 write phase, breaks later test runs
 

 Key: YARN-535
 URL: https://issues.apache.org/jira/browse/YARN-535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 3.0.0
 Environment: OS/X laptop, HFS+ filesystem
Reporter: Steve Loughran
Priority: Minor
 Attachments: YARN-535.patch


 the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. 
 As {{Configuration.writeXml()}} does a reread of all resources, this will 
 break if the (open-for-writing) resource is already visible as an empty file. 
 This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks 
 later test runs -because it is not overwritten by later incremental builds, 
 due to timestamps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-09 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-530:


Attachment: YARN-530-2.patch

This patch
* includes the static service notification tests of YARN-119
* catches when a service replaces or clones the original configuration it was 
started with, and updates the copy stored in the base service class.

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-117) Enhance YARN service model

2013-04-09 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-117:


Attachment: YARN-117-2.patch

This is an aggregate patch for jenkins and testing; YARN-530 is the subset for 
real review
* contains a code-review of all {{innerInit()}}, {{innerStart()}} ops
* contain a code review of all {{innerStop()}} operations to harden against 
null values (MAPREDUCE-3502).
* incorporates YARN-535 to fix a race condition in one test

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-2.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked from the (final) state change methods 
 in the {{AbstractService}} class (once they delegate to their inner 
 {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
 implementations of the interface.
 h2. Service listener failures not handled
 Is this an error an error or not? Log and ignore may not be what is desired.
 *Proposed:* during {{stop()}} any exception 

[jira] [Commented] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs

2013-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626867#comment-13626867
 ] 

Hadoop QA commented on YARN-535:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577836/YARN-535.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/694//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/694//console

This message is automatically generated.

 TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during 
 write phase, breaks later test runs
 

 Key: YARN-535
 URL: https://issues.apache.org/jira/browse/YARN-535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 3.0.0
 Environment: OS/X laptop, HFS+ filesystem
Reporter: Steve Loughran
Priority: Minor
 Attachments: YARN-535.patch


 the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. 
 As {{Configuration.writeXml()}} does a reread of all resources, this will 
 break if the (open-for-writing) resource is already visible as an empty file. 
 This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks 
 later test runs -because it is not overwritten by later incremental builds, 
 due to timestamps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-558) Add ability to completely remove nodemanager from resourcemanager.

2013-04-09 Thread Garth Goodson (JIRA)
Garth Goodson created YARN-558:
--

 Summary: Add ability to completely remove nodemanager from 
resourcemanager.
 Key: YARN-558
 URL: https://issues.apache.org/jira/browse/YARN-558
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Garth Goodson
Priority: Minor


I would like to add the ability to completely remove a nodemanager from the 
resourcemanager's state.

I run a cloud service where I want to dynamically bring up nodes to act as 
nodemanagers and then bring them down again when not needed.  These nodes have 
dynamically assigned IPs, thus the alternative of decommissioning them via an 
excludes file leads to a large (unbounded) list of decommissioned nodes that 
may never be commissioned again. I would like the ability to move a node from a 
decommissioned state to completely removing it from the resource manager.

I have thought of two ways of implementing this.

1) Add an optional timeout between the decommission state - being removed from 
the nodemanager.

2) Add an explicit RPC to remove a node that is decommissioned.

Any additional thoughts/discussion are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626880#comment-13626880
 ] 

Hadoop QA commented on YARN-530:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577837/YARN-530-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
33 warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/695//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/695//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/695//console

This message is automatically generated.

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-117) Enhance YARN service model

2013-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626895#comment-13626895
 ] 

Hadoop QA commented on YARN-117:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577838/YARN-117-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 19 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
33 warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/696//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/696//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/696//console

This message is automatically generated.

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-2.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the 

[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-04-09 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626930#comment-13626930
 ] 

Sandy Ryza commented on YARN-366:
-

bq. Let's make it a class-name

Sounds good.

bq. I don't like how we are passing configuration around in the 
ContainerManagerImpl constructor, we have Service.init(Configuration) exactly 
for this purpose.

This is what I tried first, but I came across the following problem, stemming 
from the fact that both WebServer and ContainerManager have a a 
ContainersMonitor:
* ContainerManager needs to pass its dispatcher to its ContainersMonitor when 
it instantiates it.
* To address this, we can instantiate ContainersMonitor in ContainerManager's 
init instead of its constructor.
* The way that composite services work, parent service inits tend to happen 
before child service inits.
* In NodeManager's init method, it instantiates a WebServer, which takes a 
ContainersMonitor as an argument.
* Because of the above, the NodeManager's ContainerManager's ContainersMonitor 
will not be instantiated yet.

To work around this, we'd need to do something like move the hierarchy around a 
little or set the WebServer's ContainersMonitor separately from the WebServer's 
instantiation.  What do you think?

 Add a tracing async dispatcher to simplify debugging
 

 Key: YARN-366
 URL: https://issues.apache.org/jira/browse/YARN-366
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366.patch


 Exceptions thrown in YARN/MR code with asynchronous event handling do not 
 contain informative stack traces, as all handle() methods sit directly under 
 the dispatcher thread's loop.
 This makes errors very difficult to debug for those who are not intimately 
 familiar with the code, as it is difficult to see which chain of events 
 caused a particular outcome.
 I propose adding an AsyncDispatcher that instruments events with tracing 
 information.  Whenever an event is dispatched during the handling of another 
 event, the dispatcher would annotate that event with a pointer to its parent. 
  When the dispatcher catches an exception, it could reconstruct a stack 
 trace of the chain of events that led to it, and be able to log something 
 informative.
 This would be an experimental feature, off by default, unless extensive 
 testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs

2013-04-09 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626943#comment-13626943
 ] 

Chris Nauroth commented on YARN-535:


Hi, Steve.  The patch looks good.  I verified that the test passes, running on 
Mac and Windows.

Do you also want to include {{TestDistributedShell#setup}} in the patch?  I 
expect the change would be nearly identical.


 TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during 
 write phase, breaks later test runs
 

 Key: YARN-535
 URL: https://issues.apache.org/jira/browse/YARN-535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 3.0.0
 Environment: OS/X laptop, HFS+ filesystem
Reporter: Steve Loughran
Priority: Minor
 Attachments: YARN-535.patch


 the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. 
 As {{Configuration.writeXml()}} does a reread of all resources, this will 
 break if the (open-for-writing) resource is already visible as an empty file. 
 This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks 
 later test runs -because it is not overwritten by later incremental builds, 
 due to timestamps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs

2013-04-09 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned YARN-535:
--

Assignee: Steve Loughran

 TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during 
 write phase, breaks later test runs
 

 Key: YARN-535
 URL: https://issues.apache.org/jira/browse/YARN-535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 3.0.0
 Environment: OS/X laptop, HFS+ filesystem
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Attachments: YARN-535.patch


 the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. 
 As {{Configuration.writeXml()}} does a reread of all resources, this will 
 break if the (open-for-writing) resource is already visible as an empty file. 
 This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks 
 later test runs -because it is not overwritten by later incremental builds, 
 due to timestamps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-546) mapred.fairscheduler.eventlog.enabled removed from Hadoop 2.0

2013-04-09 Thread Lohit Vijayarenu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lohit Vijayarenu updated YARN-546:
--

Attachment: YARN-546.1.patch

It looks like there is nothing much logged in event log and majority seems to 
be just node updates. If no one votes against to have this removed then here is 
a patch for review.

 mapred.fairscheduler.eventlog.enabled removed from Hadoop 2.0
 -

 Key: YARN-546
 URL: https://issues.apache.org/jira/browse/YARN-546
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
 Attachments: YARN-546.1.patch


 Hadoop 1.0 supported an option to turn on/off FairScheduler event logging 
 using mapred.fairscheduler.eventlog.enabled. In Hadoop 2.0, it looks like 
 this option has been removed (or not ported?) which causes event logging to 
 be enabled by default and there is no way to turn it off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

2013-04-09 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626999#comment-13626999
 ] 

Vinod Kumar Vavilapalli commented on YARN-112:
--

The patch passes eclipse on my laptop, I believe it is due to unclean .m2 on 
the build machines (HADOOP-9251).

Checking this in.

 Race in localization can cause containers to fail
 -

 Key: YARN-112
 URL: https://issues.apache.org/jira/browse/YARN-112
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Fix For: 2.0.5-beta

 Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, 
 yarn-112-20130326.patch, yarn-112-20130408.1.patch, yarn-112-20130408.patch, 
 yarn-112-20130409.patch, yarn-112.20131503.patch


 On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
 two map tasks of a MR job, that were launched almost simultaneously on the 
 same node.  It appears they both tried to localize job.jar and job.xml at the 
 same time.  One of the containers failed when it couldn't rename the 
 temporary job.jar directory to its final name because the target directory 
 wasn't empty.  Shortly afterwards the second container failed because job.xml 
 could not be found, presumably because the first container removed it when it 
 cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

2013-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627015#comment-13627015
 ] 

Hudson commented on YARN-112:
-

Integrated in Hadoop-trunk-Commit #3584 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3584/])
YARN-112. Fixed a race condition during localization that fails containers. 
Contributed by Omkar Vinit Joshi.
MAPREDUCE-5138. Fix LocalDistributedCacheManager after YARN-112. Contributed by 
Omkar Vinit Joshi. (Revision 1466196)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1466196
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


 Race in localization can cause containers to fail
 -

 Key: YARN-112
 URL: https://issues.apache.org/jira/browse/YARN-112
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Fix For: 2.0.5-beta

 Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, 
 yarn-112-20130326.patch, yarn-112-20130408.1.patch, yarn-112-20130408.patch, 
 yarn-112-20130409.patch, yarn-112.20131503.patch


 On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
 two map tasks of a MR job, that were launched almost simultaneously on the 
 same node.  It appears they both tried to localize job.jar and job.xml at the 
 same time.  One of the containers failed when it couldn't rename the 
 temporary job.jar directory to its final name because the target directory 
 wasn't empty.  Shortly afterwards the second container failed because job.xml 
 could not be found, presumably because the first container removed it when it 
 cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-534) AM max attempts is not checked when RM restart and try to recover attempts

2013-04-09 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627018#comment-13627018
 ] 

Vinod Kumar Vavilapalli commented on YARN-534:
--

The latest patch looks good, +1.

Eclipse problem is due to unclean .m2 on build machine, and it passes on my 
laptop.

Checking this in.

 AM max attempts is not checked when RM restart and try to recover attempts
 --

 Key: YARN-534
 URL: https://issues.apache.org/jira/browse/YARN-534
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-534.1.patch, YARN-534.2.patch, YARN-534.3.patch, 
 YARN-534.4.patch


 Currently,AM max attempts is only checked if the current attempt fails and 
 check to see whether to create new attempt. If the RM restarts before the 
 max-attempt fails, it'll not clean the state store, when RM comes back, it 
 will retry attempt again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-559) Make all YARN API and libraries available through an api jar

2013-04-09 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-559:
---

 Summary: Make all YARN API and libraries available through an api 
jar
 Key: YARN-559
 URL: https://issues.apache.org/jira/browse/YARN-559
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha


This should be the dependency for interacting with YARN and would prevent 
unnecessary leakage of other internal stuff.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Moved] (YARN-560) If AM fails due to overrunning resource limits, error not visible through UI sometimes

2013-04-09 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah moved MAPREDUCE-3949 to YARN-560:
-

Affects Version/s: (was: 0.23.2)
   (was: 0.24.0)
   0.23.3
   2.0.0-alpha
  Key: YARN-560  (was: MAPREDUCE-3949)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 If AM fails due to overrunning resource limits, error not visible through UI 
 sometimes
 --

 Key: YARN-560
 URL: https://issues.apache.org/jira/browse/YARN-560
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.0-alpha, 0.23.3
Reporter: Todd Lipcon
Assignee: Ravi Prakash
Priority: Minor
 Attachments: MAPREDUCE-3949.patch


 I had a case where an MR AM eclipsed the configured memory limit. This caused 
 the AM's container to get killed, but nowhere accessible through the web UI 
 showed these diagnostics. I had to go view the NM's logs via ssh before I 
 could figure out what had happened to my application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-560) If AM fails due to overrunning resource limits, error not visible through UI sometimes

2013-04-09 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-560:
-

Labels: usability  (was: )

 If AM fails due to overrunning resource limits, error not visible through UI 
 sometimes
 --

 Key: YARN-560
 URL: https://issues.apache.org/jira/browse/YARN-560
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Todd Lipcon
Assignee: Ravi Prakash
Priority: Minor
  Labels: usability
 Attachments: MAPREDUCE-3949.patch


 I had a case where an MR AM eclipsed the configured memory limit. This caused 
 the AM's container to get killed, but nowhere accessible through the web UI 
 showed these diagnostics. I had to go view the NM's logs via ssh before I 
 could figure out what had happened to my application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-451) Add more metrics to RM page

2013-04-09 Thread Lohit Vijayarenu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627203#comment-13627203
 ] 

Lohit Vijayarenu commented on YARN-451:
---

Tried to see if adding total number of Containers was trivial change, but looks 
like there is no notion of application max resource available to 
resourcemangaer. This might be the reason why RM page did not have the 
information. Looking into RMAppImpl shows that this kind of information is not 
passed either from Client/AM to RM during application initialization. 

Something close to notion of job weight I could see was Resource demand, but 
that seems to be change based on how an application request containers. For 
example FairScheduler seem to recalculate fairshare based on how much resource 
demand is passed by applications. 

One option I can think of is to add an additional field in protobuf which 
specifies what is total number of containers/resource an application might use. 
This would be optional field which can be used only by MapReduce for now and 
Client can set this value based on number of mappers/reducers. I am not sure if 
this is the right approach, any other simpler ideas people can suggest?

 Add more metrics to RM page
 ---

 Key: YARN-451
 URL: https://issues.apache.org/jira/browse/YARN-451
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Priority: Minor

 ResourceManager webUI shows list of RUNNING applications, but it does not 
 tell which applications are requesting more resource compared to others. With 
 cluster running hundreds of applications at once it would be useful to have 
 some kind of metric to show high-resource usage applications vs low-resource 
 usage ones. At the minimum showing number of containers is good option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-117) Enhance YARN service model

2013-04-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627215#comment-13627215
 ] 

Steve Loughran commented on YARN-117:
-

one test failing, looks like a service isn't shutting down reliably, so the 
next test in the same JVM can't bind to a port.
{code}
org.apache.hadoop.yarn.YarnException: java.net.BindException: Problem binding 
to [localhost:12345] java.net.BindException: Address already in use; For more 
details see:  http://wiki.apache.org/hadoop/BindException
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
at 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:63)
at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:52)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.innerStart(ContainerManagerImpl.java:230)
at 
org.apache.hadoop.yarn.service.AbstractService.start(AbstractService.java:175)
at 
org.apache.hadoop.yarn.service.CompositeService.innerStart(CompositeService.java:76)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.innerStart(NodeManager.java:209)
at 
org.apache.hadoop.yarn.service.AbstractService.start(AbstractService.java:175)
at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeReboot(TestNodeStatusUpdater.java:726)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at 
org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
Caused by: java.net.BindException: Problem binding to [localhost:12345] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:716)
at org.apache.hadoop.ipc.Server.bind(Server.java:415)
at org.apache.hadoop.ipc.Server$Listener.init(Server.java:518)
at org.apache.hadoop.ipc.Server.init(Server.java:1963)
at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:986)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.init(ProtobufRpcEngine.java:427)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:402)
at 

[jira] [Updated] (YARN-117) Enhance YARN service model

2013-04-09 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-117:


Attachment: YARN-117-3.patch

Updated patch
# fix javadoc warnings (enum @links that the IDE felt were valid, but not 
javadoc)
# added more rigorous shutdown in the test teardown, and make sure the 
overridden {{NodeManager}} overrides {{innerStop()}} and not {{stop()}}

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked from the (final) state change methods 
 in the {{AbstractService}} class (once they delegate to their inner 
 {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
 implementations of the interface.
 h2. Service listener failures not handled
 Is this an error an error or not? Log and ignore may not be what is desired.
 *Proposed:* during {{stop()}} any exception by a listener is caught and 
 discarded, to increase the likelihood 

[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-09 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-530:


Attachment: YARN-530-3.patch

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530-3.patch, 
 YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-495) Change NM behavior of reboot to resync

2013-04-09 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-495:
-

Attachment: YARN-495.3.patch

fix merge conflicts

 Change NM behavior of reboot to resync
 --

 Key: YARN-495
 URL: https://issues.apache.org/jira/browse/YARN-495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-495.1.patch, YARN-495.2.patch, YARN-495.3.patch


 When a reboot command is sent from RM, the node manager doesn't clean up the 
 containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627240#comment-13627240
 ] 

Hadoop QA commented on YARN-530:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577911/YARN-530-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/697//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/697//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/697//console

This message is automatically generated.

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530-3.patch, 
 YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-503) DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false

2013-04-09 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated YARN-503:
-

Attachment: YARN-503.patch

Fixed findbugs issues.  Upon review of those issues, I realized I was trying 
too hard to prevent tight  unlikely race conditions.

I simplified the design further which allowed for the removal of virtually all 
the synchronization which could lead to tricky deadlocks.

Updated tests to ensure apps and/or tokens don't leak.

 DelegationTokens will be renewed forever if multiple jobs share tokens and 
 the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
 --

 Key: YARN-503
 URL: https://issues.apache.org/jira/browse/YARN-503
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha
Reporter: Siddharth Seth
Assignee: Daryn Sharp
 Attachments: YARN-503.patch, YARN-503.patch


 The first Job/App to register a token is the one which DelegationTokenRenewer 
 associates with a a specific Token. An attempt to remove/cancel these shared 
 tokens by subsequent jobs doesn't work - since the JobId will not match.
 As a result, Even if subsequent jobs have 
 MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set to true - tokens will not be 
 cancelled when those jobs complete.
 Tokens will eventually be removed from the RM / JT when the service that 
 issued them considers them to have expired or via an explicit 
 cancelDelegationTokens call (not implemented yet in 23).
 A side affect of this is that the same delegation token will end up being 
 renewed multiple times (a separate TimerTask for each job which uses the 
 token).
 DelegationTokenRenewer could maintain a reference count/list of jobIds for 
 shared tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-503) DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false

2013-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627271#comment-13627271
 ] 

Hadoop QA commented on YARN-503:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577915/YARN-503.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/698//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/698//console

This message is automatically generated.

 DelegationTokens will be renewed forever if multiple jobs share tokens and 
 the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
 --

 Key: YARN-503
 URL: https://issues.apache.org/jira/browse/YARN-503
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha
Reporter: Siddharth Seth
Assignee: Daryn Sharp
 Attachments: YARN-503.patch, YARN-503.patch


 The first Job/App to register a token is the one which DelegationTokenRenewer 
 associates with a a specific Token. An attempt to remove/cancel these shared 
 tokens by subsequent jobs doesn't work - since the JobId will not match.
 As a result, Even if subsequent jobs have 
 MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set to true - tokens will not be 
 cancelled when those jobs complete.
 Tokens will eventually be removed from the RM / JT when the service that 
 issued them considers them to have expired or via an explicit 
 cancelDelegationTokens call (not implemented yet in 23).
 A side affect of this is that the same delegation token will end up being 
 renewed multiple times (a separate TimerTask for each job which uses the 
 token).
 DelegationTokenRenewer could maintain a reference count/list of jobIds for 
 shared tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-518) Fair Scheduler's document link could be added to the hadoop 2.x main doc page

2013-04-09 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-518:


Attachment: YARN-518.patch

 Fair Scheduler's document link could be added to the hadoop 2.x main doc page
 -

 Key: YARN-518
 URL: https://issues.apache.org/jira/browse/YARN-518
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.0.3-alpha
Reporter: Dapeng Sun
 Attachments: YARN-518.patch


 Currently the doc page for Fair Scheduler looks good and it’s here, 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html.
 It would be better to add the document link to the YARN section in the Hadoop 
 2.x main doc page, so that users can easily find the doc to experimentally 
 try Fair Scheduler as Capacity Scheduler. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-518) Fair Scheduler's document link could be added to the hadoop 2.x main doc page

2013-04-09 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627282#comment-13627282
 ] 

Sandy Ryza commented on YARN-518:
-

Uploaded a simple patch that adds a link to the fair scheduler to the index page

 Fair Scheduler's document link could be added to the hadoop 2.x main doc page
 -

 Key: YARN-518
 URL: https://issues.apache.org/jira/browse/YARN-518
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.0.3-alpha
Reporter: Dapeng Sun
Assignee: Sandy Ryza
 Attachments: YARN-518.patch


 Currently the doc page for Fair Scheduler looks good and it’s here, 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html.
 It would be better to add the document link to the YARN section in the Hadoop 
 2.x main doc page, so that users can easily find the doc to experimentally 
 try Fair Scheduler as Capacity Scheduler. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-518) Fair Scheduler's document link could be added to the hadoop 2.x main doc page

2013-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627290#comment-13627290
 ] 

Hadoop QA commented on YARN-518:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577924/YARN-518.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/699//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/699//console

This message is automatically generated.

 Fair Scheduler's document link could be added to the hadoop 2.x main doc page
 -

 Key: YARN-518
 URL: https://issues.apache.org/jira/browse/YARN-518
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.0.3-alpha
Reporter: Dapeng Sun
Assignee: Sandy Ryza
 Attachments: YARN-518.patch


 Currently the doc page for Fair Scheduler looks good and it’s here, 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html.
 It would be better to add the document link to the YARN section in the Hadoop 
 2.x main doc page, so that users can easily find the doc to experimentally 
 try Fair Scheduler as Capacity Scheduler. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-09 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-561:


 Summary: Nodemanager should set some key information into the 
environment of every container that it launches.
 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah


Information such as containerId, nodemanager hostname, nodemanager port is not 
set in the environment when any container is launched. 

For an AM, the RM does all of this for it but for a container launched by an 
application, all of the above need to be set by the ApplicationMaster. 

At the minimum, container id would be a useful piece of information. If the 
container wishes to talk to its local NM, the nodemanager related information 
would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-09 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-561:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-386

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-09 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627467#comment-13627467
 ] 

Vinod Kumar Vavilapalli commented on YARN-486:
--

Please also fix long lines in ContainerRemoteLaunchEvent.

Filed MAPREDUCE-5139 for tracking MR only changes. Let's continue the patch 
here and post the final MR changes there.

 Change startContainer NM API to accept Container as a parameter and make 
 ContainerLaunchContext user land
 -

 Key: YARN-486
 URL: https://issues.apache.org/jira/browse/YARN-486
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-486.1.patch, YARN-486.2.patch, YARN-486.3.patch, 
 YARN-486.4.patch


 Currently, id, resource request etc need to be copied over from Container to 
 ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
 information (such as Resource from CLC and Resource from Container and 
 Container.tokens). Sending Container directly to startContainer solves these 
 problems. It also makes CLC clean by only having stuff in it that it set by 
 the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-09 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-514:
-

Attachment: YARN-514.1.patch

In this patch, I've changed RMStateStore#storeApplication from blocking API to 
non-blocking API. Therefore, it is no longer necessary to invoke the API in 
ClientRMService#submitApplication. Instead, I defined a new state, named 
SAVING, between NEW and SUBMITTED of RMApp. TestRMAppTransitions were modified 
to test the additional state transition, and to test whether the application is 
stored before SUBMITTED and removed after FINISHED.

An additional issue is that the mapping between yarn and mapreduce states needs 
to be updated due to the newly added state. This will be filed and solved in a 
separate MR jira.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627489#comment-13627489
 ] 

Hadoop QA commented on YARN-514:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577953/YARN-514.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/700//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/700//console

This message is automatically generated.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira