[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-03 Thread Roger Hoover (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621759#comment-13621759
 ] 

Roger Hoover commented on YARN-412:
---

[~acmurthy], I've got the patch back in shape.  Can you please review it or let 
me know what the next step is?


> FifoScheduler incorrectly checking for node locality
> 
>
> Key: YARN-412
> URL: https://issues.apache.org/jira/browse/YARN-412
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
>  Labels: patch
> Attachments: YARN-412.patch
>
>
> In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
> data is local to a node by searching for the nodeAddress of the node in the 
> set of outstanding requests for the app.  This seems to be incorrect as it 
> should be checking hostname instead.  The offending line of code is 455:
> application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
> Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
> are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
> In the CapacityScheduler, it's done using hostname.  See 
> LeafQueue.assignNodeLocalContainers, line 1129
> application.getResourceRequest(priority, node.getHostName());
> Note that this bug does not affect the actual scheduling decisions made by 
> the FifoScheduler because even though it incorrect determines that a request 
> is not local to the node, it will still schedule the request immediately 
> because it's rack-local.  However, this bug may be adversely affecting the 
> reporting of job status by underreporting the number of tasks that were node 
> local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621735#comment-13621735
 ] 

Hadoop QA commented on YARN-412:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576914/YARN-412.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/669//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/669//console

This message is automatically generated.

> FifoScheduler incorrectly checking for node locality
> 
>
> Key: YARN-412
> URL: https://issues.apache.org/jira/browse/YARN-412
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
>  Labels: patch
> Attachments: YARN-412.patch
>
>
> In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
> data is local to a node by searching for the nodeAddress of the node in the 
> set of outstanding requests for the app.  This seems to be incorrect as it 
> should be checking hostname instead.  The offending line of code is 455:
> application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
> Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
> are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
> In the CapacityScheduler, it's done using hostname.  See 
> LeafQueue.assignNodeLocalContainers, line 1129
> application.getResourceRequest(priority, node.getHostName());
> Note that this bug does not affect the actual scheduling decisions made by 
> the FifoScheduler because even though it incorrect determines that a request 
> is not local to the node, it will still schedule the request immediately 
> because it's rack-local.  However, this bug may be adversely affecting the 
> reporting of job status by underreporting the number of tasks that were node 
> local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object

2013-04-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621733#comment-13621733
 ] 

Hudson commented on YARN-536:
-

Integrated in Hadoop-trunk-Commit #3560 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3560/])
YARN-536. Removed the unused objects ContainerStatus and ContainerStatus 
from Container which also don't belong to the container. Contributed by Xuan 
Gong. (Revision 1464271)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464271
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Container.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java


> Remove ContainerStatus, ContainerState from Container api interface as they 
> will not be called by the container object
> --
>
> Key: YARN-536
> URL: https://issues.apache.org/jira/browse/YARN-536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.0.5-beta
>
> Attachments: YARN-536.1.patch, YARN-536.2.patch
>
>
> Remove containerstate, containerStatus from container interface. They will 
> not be called by container object

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object

2013-04-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621720#comment-13621720
 ] 

Vinod Kumar Vavilapalli commented on YARN-536:
--

+1, this looks good. Checking it in.

> Remove ContainerStatus, ContainerState from Container api interface as they 
> will not be called by the container object
> --
>
> Key: YARN-536
> URL: https://issues.apache.org/jira/browse/YARN-536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-536.1.patch, YARN-536.2.patch
>
>
> Remove containerstate, containerStatus from container interface. They will 
> not be called by container object

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621709#comment-13621709
 ] 

Hitesh Shah commented on YARN-412:
--

@Roger, for future reference ( may not be applicable to this jira ), it is good 
to leave earlier patch attachments lying around and not delete them when 
uploading newer patches. This can be used to trace review comments/feedback etc.

As for hadoop-common, mvn eclipse:eclipse, it can be ignored for now. It is a 
known issue with an open jira that has not been addressed yet.

> FifoScheduler incorrectly checking for node locality
> 
>
> Key: YARN-412
> URL: https://issues.apache.org/jira/browse/YARN-412
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
>  Labels: patch
> Attachments: YARN-412.patch
>
>
> In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
> data is local to a node by searching for the nodeAddress of the node in the 
> set of outstanding requests for the app.  This seems to be incorrect as it 
> should be checking hostname instead.  The offending line of code is 455:
> application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
> Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
> are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
> In the CapacityScheduler, it's done using hostname.  See 
> LeafQueue.assignNodeLocalContainers, line 1129
> application.getResourceRequest(priority, node.getHostName());
> Note that this bug does not affect the actual scheduling decisions made by 
> the FifoScheduler because even though it incorrect determines that a request 
> is not local to the node, it will still schedule the request immediately 
> because it's rack-local.  However, this bug may be adversely affecting the 
> reporting of job status by underreporting the number of tasks that were node 
> local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-03 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
--

Attachment: (was: YARN-412.patch)

> FifoScheduler incorrectly checking for node locality
> 
>
> Key: YARN-412
> URL: https://issues.apache.org/jira/browse/YARN-412
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
>  Labels: patch
> Attachments: YARN-412.patch
>
>
> In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
> data is local to a node by searching for the nodeAddress of the node in the 
> set of outstanding requests for the app.  This seems to be incorrect as it 
> should be checking hostname instead.  The offending line of code is 455:
> application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
> Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
> are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
> In the CapacityScheduler, it's done using hostname.  See 
> LeafQueue.assignNodeLocalContainers, line 1129
> application.getResourceRequest(priority, node.getHostName());
> Note that this bug does not affect the actual scheduling decisions made by 
> the FifoScheduler because even though it incorrect determines that a request 
> is not local to the node, it will still schedule the request immediately 
> because it's rack-local.  However, this bug may be adversely affecting the 
> reporting of job status by underreporting the number of tasks that were node 
> local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-03 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
--

Attachment: YARN-412.patch

> FifoScheduler incorrectly checking for node locality
> 
>
> Key: YARN-412
> URL: https://issues.apache.org/jira/browse/YARN-412
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
>  Labels: patch
> Attachments: YARN-412.patch
>
>
> In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
> data is local to a node by searching for the nodeAddress of the node in the 
> set of outstanding requests for the app.  This seems to be incorrect as it 
> should be checking hostname instead.  The offending line of code is 455:
> application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
> Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
> are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
> In the CapacityScheduler, it's done using hostname.  See 
> LeafQueue.assignNodeLocalContainers, line 1129
> application.getResourceRequest(priority, node.getHostName());
> Note that this bug does not affect the actual scheduling decisions made by 
> the FifoScheduler because even though it incorrect determines that a request 
> is not local to the node, it will still schedule the request immediately 
> because it's rack-local.  However, this bug may be adversely affecting the 
> reporting of job status by underreporting the number of tasks that were node 
> local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs

2013-04-03 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621573#comment-13621573
 ] 

Chris Nauroth commented on YARN-535:


{{TestDistributedShell#setup}} has nearly identical code to overwrite 
yarn-site.xml.

> TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during 
> write phase, breaks later test runs
> 
>
> Key: YARN-535
> URL: https://issues.apache.org/jira/browse/YARN-535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 3.0.0
> Environment: OS/X laptop, HFS+ filesystem
>Reporter: Steve Loughran
>Priority: Minor
>
> the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. 
> As {{Configuration.writeXml()}} does a reread of all resources, this will 
> break if the (open-for-writing) resource is already visible as an empty file. 
> This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks 
> later test runs -because it is not overwritten by later incremental builds, 
> due to timestamps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-540) RM state store not cleaned if job succeeds but RM shutdown and restart-dispatcher stopped before it can process REMOVE_APP event

2013-04-03 Thread Jian He (JIRA)
Jian He created YARN-540:


 Summary: RM state store not cleaned if job succeeds but RM 
shutdown and restart-dispatcher stopped before it can process REMOVE_APP event
 Key: YARN-540
 URL: https://issues.apache.org/jira/browse/YARN-540
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He


When job succeeds and successfully call finishApplicationMaster, RM shutdown 
and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
next time RM comes back, it will reload the existing state files even though 
the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2013-04-03 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-467:
---

Attachment: yarn-467-testCode.tar

This will help in testing distributed cache patch.

> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Fix For: 2.0.5-beta
>
> Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
> yarn-467-20130401.patch, yarn-467-20130402.1.patch, 
> yarn-467-20130402.2.patch, yarn-467-20130402.patch, yarn-467-testCode.tar
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object

2013-04-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621390#comment-13621390
 ] 

Hadoop QA commented on YARN-536:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576856/YARN-536.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/667//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/667//console

This message is automatically generated.

> Remove ContainerStatus, ContainerState from Container api interface as they 
> will not be called by the container object
> --
>
> Key: YARN-536
> URL: https://issues.apache.org/jira/browse/YARN-536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-536.1.patch, YARN-536.2.patch
>
>
> Remove containerstate, containerStatus from container interface. They will 
> not be called by container object

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621387#comment-13621387
 ] 

Hadoop QA commented on YARN-193:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576857/YARN-193.14.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/668//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/668//console

This message is automatically generated.

> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
> YARN-193.13.patch, YARN-193.14.patch, YARN-193.4.patch, YARN-193.5.patch, 
> YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs

2013-04-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621385#comment-13621385
 ] 

Vinod Kumar Vavilapalli commented on YARN-458:
--

+1 for the patch after the fact. Thanks for doing this Sandy.

> YARN daemon addresses must be placed in many different configs
> --
>
> Key: YARN-458
> URL: https://issues.apache.org/jira/browse/YARN-458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.0.5-beta
>
> Attachments: YARN-458.patch
>
>
> The YARN resourcemanager's address is included in four different configs: 
> yarn.resourcemanager.scheduler.address, 
> yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
> and yarn.resourcemanager.admin.address
> A new user trying to configure a cluster needs to know the names of all these 
> four configs.
> The same issue exists for nodemanagers.
> It would be much easier if they could simply specify 
> yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
> for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails

2013-04-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-539:
-

Summary: LocalizedResources are leaked in memory in case resource 
localization fails  (was: Memory leak in case resource localization fails. 
LocalizedResource remains in memory.)

> LocalizedResources are leaked in memory in case resource localization fails
> ---
>
> Key: YARN-539
> URL: https://issues.apache.org/jira/browse/YARN-539
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> If resource localization fails then resource remains in memory and is
> 1) Either cleaned up when next time cache cleanup runs and there is space 
> crunch. (If sufficient space in cache is available then it will remain in 
> memory).
> 2) reused if LocalizationRequest comes again for the same resource.
> I think when resource localization fails then that event should be sent to 
> LocalResourceTracker which will then remove it from its cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-537) Waiting containers are not informed if private localization for a resource fails.

2013-04-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621381#comment-13621381
 ] 

Vinod Kumar Vavilapalli commented on YARN-537:
--

Yup, I put in a comment long (long) time back asking why it isn't getting 
informed through the LocalizedResource which knows about all the waiting 
containers. I think we should do that.

> Waiting containers are not informed if private localization for a resource 
> fails.
> -
>
> Key: YARN-537
> URL: https://issues.apache.org/jira/browse/YARN-537
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> In ResourceLocalizationService.LocalizerRunner.update() if localization fails 
> then all the other waiting containers are not informed only the initiator is 
> informed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-03 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-193:
-

Attachment: YARN-193.14.patch

Fixed the buggy test TestResourceManager#testResourceManagerInitConfigValidation

> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
> YARN-193.13.patch, YARN-193.14.patch, YARN-193.4.patch, YARN-193.5.patch, 
> YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object

2013-04-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-536:
---

Attachment: YARN-536.2.patch

Fix the bug..


> Remove ContainerStatus, ContainerState from Container api interface as they 
> will not be called by the container object
> --
>
> Key: YARN-536
> URL: https://issues.apache.org/jira/browse/YARN-536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-536.1.patch, YARN-536.2.patch
>
>
> Remove containerstate, containerStatus from container interface. They will 
> not be called by container object

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object

2013-04-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621343#comment-13621343
 ] 

Hadoop QA commented on YARN-536:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576843/YARN-536.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/664//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/664//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/664//console

This message is automatically generated.

> Remove ContainerStatus, ContainerState from Container api interface as they 
> will not be called by the container object
> --
>
> Key: YARN-536
> URL: https://issues.apache.org/jira/browse/YARN-536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-536.1.patch
>
>
> Remove containerstate, containerStatus from container interface. They will 
> not be called by container object

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621339#comment-13621339
 ] 

Hadoop QA commented on YARN-99:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576840/yarn-99-20130403.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/666//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/666//console

This message is automatically generated.

> Jobs fail during resource localization when private distributed-cache hits 
> unix directory limits
> 
>
> Key: YARN-99
> URL: https://issues.apache.org/jira/browse/YARN-99
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, 
> yarn-99-20130403.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache. The jobs start failing with the 
> below exception.
> {code:xml}
> java.io.IOException: mkdir of 
> /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified 
> number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621338#comment-13621338
 ] 

Hadoop QA commented on YARN-412:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576838/YARN-412.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/665//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/665//console

This message is automatically generated.

> FifoScheduler incorrectly checking for node locality
> 
>
> Key: YARN-412
> URL: https://issues.apache.org/jira/browse/YARN-412
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
>  Labels: patch
> Attachments: YARN-412.patch
>
>
> In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
> data is local to a node by searching for the nodeAddress of the node in the 
> set of outstanding requests for the app.  This seems to be incorrect as it 
> should be checking hostname instead.  The offending line of code is 455:
> application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
> Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
> are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
> In the CapacityScheduler, it's done using hostname.  See 
> LeafQueue.assignNodeLocalContainers, line 1129
> application.getResourceRequest(priority, node.getHostName());
> Note that this bug does not affect the actual scheduling decisions made by 
> the FifoScheduler because even though it incorrect determines that a request 
> is not local to the node, it will still schedule the request immediately 
> because it's rack-local.  However, this bug may be adversely affecting the 
> reporting of job status by underreporting the number of tasks that were node 
> local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs

2013-04-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621333#comment-13621333
 ] 

Hudson commented on YARN-458:
-

Integrated in Hadoop-trunk-Commit #3556 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3556/])
YARN-458. YARN daemon addresses must be placed in many different configs. 
(sandyr via tucu) (Revision 1464204)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464204
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> YARN daemon addresses must be placed in many different configs
> --
>
> Key: YARN-458
> URL: https://issues.apache.org/jira/browse/YARN-458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.0.5-beta
>
> Attachments: YARN-458.patch
>
>
> The YARN resourcemanager's address is included in four different configs: 
> yarn.resourcemanager.scheduler.address, 
> yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
> and yarn.resourcemanager.admin.address
> A new user trying to configure a cluster needs to know the names of all these 
> four configs.
> The same issue exists for nodemanagers.
> It would be much easier if they could simply specify 
> yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
> for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-538) RM address DNS lookup can cause unnecessary slowness on every JHS page load

2013-04-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621323#comment-13621323
 ] 

Hudson commented on YARN-538:
-

Integrated in Hadoop-trunk-Commit #3555 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3555/])
YARN-538. RM address DNS lookup can cause unnecessary slowness on every JHS 
page load. (sandyr via tucu) (Revision 1464197)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464197
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> RM address DNS lookup can cause unnecessary slowness on every JHS page load 
> 
>
> Key: YARN-538
> URL: https://issues.apache.org/jira/browse/YARN-538
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.0.5-beta
>
> Attachments: MAPREDUCE-5111.patch
>
>
> When I run the job history server locally, every page load takes in the 10s 
> of seconds.  I profiled the process and discovered that all the extra time 
> was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 
> to a hostname.  When I changed my yarn.resourcemanager.address to localhost, 
> the page load times decreased drastically.
> There's no that we need to perform this resolution on every page load.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing

2013-04-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621324#comment-13621324
 ] 

Hudson commented on YARN-516:
-

Integrated in Hadoop-trunk-Commit #3555 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3555/])
Revert YARN-516 per HADOOP-9357. (Revision 1464181)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464181
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java


> TestContainerLocalizer.testContainerLocalizerMain is failing
> 
>
> Key: YARN-516
> URL: https://issues.apache.org/jira/browse/YARN-516
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Andrew Wang
> Fix For: 2.0.5-beta
>
> Attachments: YARN-516.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs

2013-04-03 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621319#comment-13621319
 ] 

Alejandro Abdelnur commented on YARN-458:
-

+1. Do we need to do this for HS as well? If so please open a new JIRA.

> YARN daemon addresses must be placed in many different configs
> --
>
> Key: YARN-458
> URL: https://issues.apache.org/jira/browse/YARN-458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-458.patch
>
>
> The YARN resourcemanager's address is included in four different configs: 
> yarn.resourcemanager.scheduler.address, 
> yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
> and yarn.resourcemanager.admin.address
> A new user trying to configure a cluster needs to know the names of all these 
> four configs.
> The same issue exists for nodemanagers.
> It would be much easier if they could simply specify 
> yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
> for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-539) Memory leak in case resource localization fails. LocalizedResource remains in memory.

2013-04-03 Thread Omkar Vinit Joshi (JIRA)
Omkar Vinit Joshi created YARN-539:
--

 Summary: Memory leak in case resource localization fails. 
LocalizedResource remains in memory.
 Key: YARN-539
 URL: https://issues.apache.org/jira/browse/YARN-539
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi


If resource localization fails then resource remains in memory and is
1) Either cleaned up when next time cache cleanup runs and there is space 
crunch. (If sufficient space in cache is available then it will remain in 
memory).
2) reused if LocalizationRequest comes again for the same resource.

I think when resource localization fails then that event should be sent to 
LocalResourceTracker which will then remove it from its cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Moved] (YARN-538) RM address DNS lookup can cause unnecessary slowness on every JHS page load

2013-04-03 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur moved MAPREDUCE-5111 to YARN-538:


  Component/s: (was: jobhistoryserver)
Affects Version/s: (was: 2.0.3-alpha)
   2.0.3-alpha
  Key: YARN-538  (was: MAPREDUCE-5111)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

> RM address DNS lookup can cause unnecessary slowness on every JHS page load 
> 
>
> Key: YARN-538
> URL: https://issues.apache.org/jira/browse/YARN-538
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5111.patch
>
>
> When I run the job history server locally, every page load takes in the 10s 
> of seconds.  I profiled the process and discovered that all the extra time 
> was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 
> to a hostname.  When I changed my yarn.resourcemanager.address to localhost, 
> the page load times decreased drastically.
> There's no that we need to perform this resolution on every page load.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-537) Waiting containers are not informed if private localization for a resource fails.

2013-04-03 Thread Omkar Vinit Joshi (JIRA)
Omkar Vinit Joshi created YARN-537:
--

 Summary: Waiting containers are not informed if private 
localization for a resource fails.
 Key: YARN-537
 URL: https://issues.apache.org/jira/browse/YARN-537
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi


In ResourceLocalizationService.LocalizerRunner.update() if localization fails 
then all the other waiting containers are not informed only the initiator is 
informed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object

2013-04-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-536:
---

Attachment: YARN-536.1.patch

> Remove ContainerStatus, ContainerState from Container api interface as they 
> will not be called by the container object
> --
>
> Key: YARN-536
> URL: https://issues.apache.org/jira/browse/YARN-536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-536.1.patch
>
>
> Remove containerstate, containerStatus from container interface. They will 
> not be called by container object

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-03 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-99:
--

Attachment: yarn-99-20130403.1.patch

> Jobs fail during resource localization when private distributed-cache hits 
> unix directory limits
> 
>
> Key: YARN-99
> URL: https://issues.apache.org/jira/browse/YARN-99
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, 
> yarn-99-20130403.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache. The jobs start failing with the 
> below exception.
> {code:xml}
> java.io.IOException: mkdir of 
> /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified 
> number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-03 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
--

Attachment: YARN-412.patch

> FifoScheduler incorrectly checking for node locality
> 
>
> Key: YARN-412
> URL: https://issues.apache.org/jira/browse/YARN-412
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
>  Labels: patch
> Attachments: YARN-412.patch
>
>
> In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
> data is local to a node by searching for the nodeAddress of the node in the 
> set of outstanding requests for the app.  This seems to be incorrect as it 
> should be checking hostname instead.  The offending line of code is 455:
> application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
> Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
> are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
> In the CapacityScheduler, it's done using hostname.  See 
> LeafQueue.assignNodeLocalContainers, line 1129
> application.getResourceRequest(priority, node.getHostName());
> Note that this bug does not affect the actual scheduling decisions made by 
> the FifoScheduler because even though it incorrect determines that a request 
> is not local to the node, it will still schedule the request immediately 
> because it's rack-local.  However, this bug may be adversely affecting the 
> reporting of job status by underreporting the number of tasks that were node 
> local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-03 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
--

Attachment: (was: YARN-412.patch)

> FifoScheduler incorrectly checking for node locality
> 
>
> Key: YARN-412
> URL: https://issues.apache.org/jira/browse/YARN-412
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
>  Labels: patch
> Attachments: YARN-412.patch
>
>
> In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
> data is local to a node by searching for the nodeAddress of the node in the 
> set of outstanding requests for the app.  This seems to be incorrect as it 
> should be checking hostname instead.  The offending line of code is 455:
> application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
> Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
> are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
> In the CapacityScheduler, it's done using hostname.  See 
> LeafQueue.assignNodeLocalContainers, line 1129
> application.getResourceRequest(priority, node.getHostName());
> Note that this bug does not affect the actual scheduling decisions made by 
> the FifoScheduler because even though it incorrect determines that a request 
> is not local to the node, it will still schedule the request immediately 
> because it's rack-local.  However, this bug may be adversely affecting the 
> reporting of job status by underreporting the number of tasks that were node 
> local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing

2013-04-03 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621255#comment-13621255
 ] 

Eli Collins commented on YARN-516:
--

I reverted this change (and the initial HADOOP-9357 patch). We'll put this fix 
back in the HADOOP-9357 patch if we do another rev.

> TestContainerLocalizer.testContainerLocalizerMain is failing
> 
>
> Key: YARN-516
> URL: https://issues.apache.org/jira/browse/YARN-516
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Andrew Wang
> Fix For: 2.0.5-beta
>
> Attachments: YARN-516.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621243#comment-13621243
 ] 

Hadoop QA commented on YARN-99:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576823/yarn-99-20130403.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/663//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/663//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/663//console

This message is automatically generated.

> Jobs fail during resource localization when private distributed-cache hits 
> unix directory limits
> 
>
> Key: YARN-99
> URL: https://issues.apache.org/jira/browse/YARN-99
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-99-20130324.patch, yarn-99-20130403.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache. The jobs start failing with the 
> below exception.
> {code:xml}
> java.io.IOException: mkdir of 
> /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified 
> number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621238#comment-13621238
 ] 

Hadoop QA commented on YARN-193:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576820/YARN-193.13.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/662//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/662//console

This message is automatically generated.

> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
> YARN-193.13.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, 
> YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-425) coverage fix for yarn api

2013-04-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621218#comment-13621218
 ] 

Hadoop QA commented on YARN-425:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576764/YARN-425-trunk-b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/661//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/661//console

This message is automatically generated.

> coverage fix for yarn api
> -
>
> Key: YARN-425
> URL: https://issues.apache.org/jira/browse/YARN-425
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, 
> YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, 
> YARN-425-trunk.patch
>
>
> coverage fix for yarn api
> patch YARN-425-trunk-a.patch for trunk
> patch YARN-425-branch-2.patch for branch-2
> patch YARN-425-branch-0.23.patch for branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy

2013-04-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621215#comment-13621215
 ] 

Hadoop QA commented on YARN-465:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576782/YARN-465-trunk-a.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/659//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/659//console

This message is automatically generated.

> fix coverage  org.apache.hadoop.yarn.server.webproxy
> 
>
> Key: YARN-465
> URL: https://issues.apache.org/jira/browse/YARN-465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-465-branch-0.23-a.patch, 
> YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, 
> YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch
>
>
> fix coverage  org.apache.hadoop.yarn.server.webproxy
> patch YARN-465-trunk.patch for trunk
> patch YARN-465-branch-2.patch for branch-2
> patch YARN-465-branch-0.23.patch for branch-0.23
> There is issue in branch-0.23 . Patch does not creating .keep file.
> For fix it need to run commands:
> mkdir 
> yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
> touch 
> yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-381) Improve FS docs

2013-04-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621203#comment-13621203
 ] 

Hudson commented on YARN-381:
-

Integrated in Hadoop-trunk-Commit #3554 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3554/])
YARN-381. Improve fair scheduler docs. Contributed by Sandy Ryza. (Revision 
1464130)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464130
Files : 
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm


> Improve FS docs
> ---
>
> Key: YARN-381
> URL: https://issues.apache.org/jira/browse/YARN-381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Sandy Ryza
>Priority: Minor
> Fix For: 2.0.5-beta
>
> Attachments: YARN-381.patch
>
>
> The MR2 FS docs could use some improvements.
> Configuration:
> - sizebasedweight - what is the "size" here? Total memory usage?
> Pool properties:
> - minResources - what does min amount of aggregate memory mean given that 
> this is not a reservation?
> - maxResources - is this a hard limit?
> - weight: How is this  ratio configured?  Eg base is 1 and all weights are 
> relative to that?
> - schedulingMode - what is the default? Is fifo pure fifo, eg waits until all 
> tasks for the job are finished before launching the next job?
> There's no mention of ACLs, even though they're supported. See the CS docs 
> for comparison.
> Also there are a couple typos worth fixing while we're at it, eg "finish. 
> apps to run"
> Worth keeping in mind that some of these will need to be updated to reflect 
> that resource calculators are now pluggable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621204#comment-13621204
 ] 

Hadoop QA commented on YARN-427:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576767/YARN-427-trunk-a.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/660//console

This message is automatically generated.

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, 
> YARN-427-trunk-a.patch, YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-04-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621201#comment-13621201
 ] 

Hudson commented on YARN-101:
-

Integrated in Hadoop-trunk-Commit #3554 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3554/])
YARN-101. Fix NodeManager heartbeat processing to not lose track of 
completed containers in case of dropped heartbeats. Contributed by Xuan Gong. 
(Revision 1464105)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464105
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


> If  the heartbeat message loss, the nodestatus info of complete container 
> will loss too.
> 
>
> Key: YARN-101
> URL: https://issues.apache.org/jira/browse/YARN-101
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: suse.
>Reporter: xieguiming
>Assignee: Xuan Gong
>Priority: Minor
> Fix For: 2.0.5-beta
>
> Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, 
> YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch
>
>
> see the red color:
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
>  protected void startStatusUpdater() {
> new Thread("Node Status Updater") {
>   @Override
>   @SuppressWarnings("unchecked")
>   public void run() {
> int lastHeartBeatID = 0;
> while (!isStopped) {
>   // Send heartbeat
>   try {
> synchronized (heartbeatMonitor) {
>   heartbeatMonitor.wait(heartBeatInterval);
> }
> {color:red} 
> // Before we send the heartbeat, we get the NodeStatus,
> // whose method removes completed containers.
> NodeStatus nodeStatus = getNodeStatus();
>  {color}
> nodeStatus.setResponseId(lastHeartBeatID);
> 
> NodeHeartbeatRequest request = recordFactory
> .newRecordInstance(NodeHeartbeatRequest.class);
> request.setNodeStatus(nodeStatus);   
> {color:red} 
>// But if the nodeHeartbeat fails, we've already removed the 
> containers away to know about it. We aren't handling a nodeHeartbeat failure 
> case here.
> HeartbeatResponse response =
>   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
>{color} 
> if (response.getNodeAction() == NodeAction.SHUTDOWN) {
>   LOG
>   .info("Recieved SHUTDOWN signal from Resourcemanager as 
> part of heartbeat," +
>   " hence shutting down.");
>   NodeStatusUpdaterImpl.this.stop();
>   break;
> }
> if (response.getNodeAction() == NodeAction.REBOOT) {
>   LOG.info("Node is out of sync with ResourceManager,"
>   + " hence rebooting.");
>   NodeStatusUpdaterImpl.this.reboot();
>   break;
> }
> lastHeartBeatID = response.getResponseId();
> List containersToCleanup = response
> .getContainersToCleanupList();
> if (containersToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedContainersEvent(containersToCleanup));
> }
> List appsToCleanup =
> response.getApplicationsToCleanupList();
> //Only start tracking for keepAlive on FINISH_APP
> trackAppsForKeepAlive(appsToCleanup);
> if (appsToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedAppsEvent(appsToCleanup));
> }
>   } catch (Throwable e) {
> // TODO Better error handling. Thread can die with the rest of the
> // NM still running.
> LOG.error("Caught exception in status-updater", e);
>   }
> }
>   }
> }.start();
>   }
>   private NodeStatus getNodeStatus() {
> NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.cla

[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object

2013-04-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621180#comment-13621180
 ] 

Xuan Gong commented on YARN-536:


Remove getter and setter for ContainerState, ContainerStatus from container 
interface, remove those contents from proto file. There are some test code 
which used the getter and setter to get containerState or containerStatus from 
container object. 
/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java
/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java

> Remove ContainerStatus, ContainerState from Container api interface as they 
> will not be called by the container object
> --
>
> Key: YARN-536
> URL: https://issues.apache.org/jira/browse/YARN-536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>
> Remove containerstate, containerStatus from container interface. They will 
> not be called by container object

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-404) Node Manager leaks Data Node connections

2013-04-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-404:
-

Priority: Major  (was: Blocker)

Moving it off blocker status.

Devaraj, can you give us more information. Is this still happening? Tx.

> Node Manager leaks Data Node connections
> 
>
> Key: YARN-404
> URL: https://issues.apache.org/jira/browse/YARN-404
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.2-alpha, 0.23.6
>Reporter: Devaraj K
>Assignee: Devaraj K
>
> RM is missing to give some applications to NM for clean up, due to this log 
> aggregation is not happening for those applications and also it is leaking 
> data node connections in NM side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object

2013-04-03 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-536:
--

 Summary: Remove ContainerStatus, ContainerState from Container api 
interface as they will not be called by the container object
 Key: YARN-536
 URL: https://issues.apache.org/jira/browse/YARN-536
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong


Remove containerstate, containerStatus from container interface. They will not 
be called by container object

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object

2013-04-03 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-536:
--

Assignee: Xuan Gong

> Remove ContainerStatus, ContainerState from Container api interface as they 
> will not be called by the container object
> --
>
> Key: YARN-536
> URL: https://issues.apache.org/jira/browse/YARN-536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>
> Remove containerstate, containerStatus from container interface. They will 
> not be called by container object

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-03 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-514:


Assignee: Zhijie Shen  (was: Bikas Saha)

> Delayed store operations should not result in RM unavailability for app 
> submission
> --
>
> Key: YARN-514
> URL: https://issues.apache.org/jira/browse/YARN-514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Zhijie Shen
>
> Currently, app submission is the only store operation performed synchronously 
> because the app must be stored before the request returns with success. This 
> makes the RM susceptible to blocking all client threads on slow store 
> operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-03 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-99:
--

Attachment: yarn-99-20130403.patch

> Jobs fail during resource localization when private distributed-cache hits 
> unix directory limits
> 
>
> Key: YARN-99
> URL: https://issues.apache.org/jira/browse/YARN-99
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-99-20130324.patch, yarn-99-20130403.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache. The jobs start failing with the 
> below exception.
> {code:xml}
> java.io.IOException: mkdir of 
> /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified 
> number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits

2013-04-03 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621128#comment-13621128
 ] 

Omkar Vinit Joshi commented on YARN-99:
---

Rebasing the patch as 467 is now committed.
This issue is related to 467 and the detailed information can be found here 
[underlying problem and proposed/implemented Solution | 
https://issues.apache.org/jira/browse/YARN-467?focusedCommentId=13615894&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13615894]

The only difference here is that the same problem is present in 
/usercache//filecache (Private user cache). We are using 
LocalCacheDirectoryManager for user-cache but not for app-cache as it is highly 
unlikely for application to have so many localized files.

Earlier implementation for private cache involved computing localized path 
inside ContainerLocalizer; i.e. in different processes. Now in order to 
centralize this we have moved it to ResourceLocalizationService.LocalizerRunner 
and this is communicated to all the ContainerLocalizer as a part of the 
heartbeat. Thereby we can now manage LocalCacheDirectory at one place.

> Jobs fail during resource localization when private distributed-cache hits 
> unix directory limits
> 
>
> Key: YARN-99
> URL: https://issues.apache.org/jira/browse/YARN-99
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Devaraj K
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-99-20130324.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache. The jobs start failing with the 
> below exception.
> {code:xml}
> java.io.IOException: mkdir of 
> /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {code}
> We should have a mechanism to clean the cache files if it crosses specified 
> number of directories like cache size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-03 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-193:
-

Attachment: YARN-193.13.patch

Fix the twice setting bug and change default max vcores to 4.

> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
> YARN-193.13.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, 
> YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs

2013-04-03 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621114#comment-13621114
 ] 

Sandy Ryza commented on YARN-458:
-

Verified on a pseudo-distributed cluster that both the old and new configs work.

> YARN daemon addresses must be placed in many different configs
> --
>
> Key: YARN-458
> URL: https://issues.apache.org/jira/browse/YARN-458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-458.patch
>
>
> The YARN resourcemanager's address is included in four different configs: 
> yarn.resourcemanager.scheduler.address, 
> yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
> and yarn.resourcemanager.admin.address
> A new user trying to configure a cluster needs to know the names of all these 
> four configs.
> The same issue exists for nodemanagers.
> It would be much easier if they could simply specify 
> yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
> for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-248) Security related work for RM restart

2013-04-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-248:
-

Summary: Security related work for RM restart  (was: Restore 
RMDelegationTokenSecretManager state on restart)

> Security related work for RM restart
> 
>
> Key: YARN-248
> URL: https://issues.apache.org/jira/browse/YARN-248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tom White
>Assignee: Bikas Saha
>
> On restart, the RM creates a new RMDelegationTokenSecretManager with fresh 
> state. This will cause problems for Oozie jobs running on secure clusters 
> since the delegation tokens stored in the job credentials (used by the Oozie 
> launcher job to submit a job to the RM) will not be recognized by the RM, and 
> recovery will fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-430) Add HDFS based store for RM which manages the store using directories

2013-04-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-430:
-

Summary: Add HDFS based store for RM which manages the store using 
directories  (was: Add HDFS based store for RM)

> Add HDFS based store for RM which manages the store using directories
> -
>
> Key: YARN-430
> URL: https://issues.apache.org/jira/browse/YARN-430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
>
> There is a generic FileSystem store but it does not take advantage of HDFS 
> features like directories, replication, DFSClient advanced settings for HA, 
> retries etc. Writing a store thats optimized for HDFS would be good.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621073#comment-13621073
 ] 

Bikas Saha commented on YARN-193:
-

These values need to be on the conservative side so that they work on most 
installations. Given 24-32GB memory is becoming baseline nowadays 8GB default 
for max is ok IMO. Given 16 cores becoming baseline nowadays 4 cores sounds 
like a good default for max IMO. This is per container and its not easy to 
write code that actually maxes out 8 cores :P

> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
> YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, 
> YARN-193.8.patch, YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-04-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621044#comment-13621044
 ] 

Vinod Kumar Vavilapalli commented on YARN-101:
--

Looks much better, +1, checking it in.

> If  the heartbeat message loss, the nodestatus info of complete container 
> will loss too.
> 
>
> Key: YARN-101
> URL: https://issues.apache.org/jira/browse/YARN-101
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: suse.
>Reporter: xieguiming
>Assignee: Xuan Gong
>Priority: Minor
> Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, 
> YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch
>
>
> see the red color:
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
>  protected void startStatusUpdater() {
> new Thread("Node Status Updater") {
>   @Override
>   @SuppressWarnings("unchecked")
>   public void run() {
> int lastHeartBeatID = 0;
> while (!isStopped) {
>   // Send heartbeat
>   try {
> synchronized (heartbeatMonitor) {
>   heartbeatMonitor.wait(heartBeatInterval);
> }
> {color:red} 
> // Before we send the heartbeat, we get the NodeStatus,
> // whose method removes completed containers.
> NodeStatus nodeStatus = getNodeStatus();
>  {color}
> nodeStatus.setResponseId(lastHeartBeatID);
> 
> NodeHeartbeatRequest request = recordFactory
> .newRecordInstance(NodeHeartbeatRequest.class);
> request.setNodeStatus(nodeStatus);   
> {color:red} 
>// But if the nodeHeartbeat fails, we've already removed the 
> containers away to know about it. We aren't handling a nodeHeartbeat failure 
> case here.
> HeartbeatResponse response =
>   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
>{color} 
> if (response.getNodeAction() == NodeAction.SHUTDOWN) {
>   LOG
>   .info("Recieved SHUTDOWN signal from Resourcemanager as 
> part of heartbeat," +
>   " hence shutting down.");
>   NodeStatusUpdaterImpl.this.stop();
>   break;
> }
> if (response.getNodeAction() == NodeAction.REBOOT) {
>   LOG.info("Node is out of sync with ResourceManager,"
>   + " hence rebooting.");
>   NodeStatusUpdaterImpl.this.reboot();
>   break;
> }
> lastHeartBeatID = response.getResponseId();
> List containersToCleanup = response
> .getContainersToCleanupList();
> if (containersToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedContainersEvent(containersToCleanup));
> }
> List appsToCleanup =
> response.getApplicationsToCleanupList();
> //Only start tracking for keepAlive on FINISH_APP
> trackAppsForKeepAlive(appsToCleanup);
> if (appsToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedAppsEvent(appsToCleanup));
> }
>   } catch (Throwable e) {
> // TODO Better error handling. Thread can die with the rest of the
> // NM still running.
> LOG.error("Caught exception in status-updater", e);
>   }
> }
>   }
> }.start();
>   }
>   private NodeStatus getNodeStatus() {
> NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
> nodeStatus.setNodeId(this.nodeId);
> int numActiveContainers = 0;
> List containersStatuses = new 
> ArrayList();
> for (Iterator> i =
> this.context.getContainers().entrySet().iterator(); i.hasNext();) {
>   Entry e = i.next();
>   ContainerId containerId = e.getKey();
>   Container container = e.getValue();
>   // Clone the container to send it to the RM
>   org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 
>   container.cloneAndGetContainerStatus();
>   containersStatuses.add(containerStatus);
>   ++numActiveContainers;
>   LOG.info("Sending out status for container: " + containerStatus);
>   {color:red} 
>   // Here is the part that removes the completed containers.
>   if (containerStatus.getState() == ContainerState.COMPLETE) {
> // Remove
> i.remove();
>   {color} 
> LOG.info("Removed completed container " + container

[jira] [Commented] (YARN-527) Local filecache mkdir fails

2013-04-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621042#comment-13621042
 ] 

Vinod Kumar Vavilapalli commented on YARN-527:
--

If it is the 32K limit that caused it, the timing can't be more perfect. I just 
committed YARN-467 which addresses it for public cache, and YARN-99 is in 
progress which takes care of private cache. These two JIRAs enforce a limit in 
YARN itself, default is 8192.

Looking back again at your stack trace, I agree that it is very likely you are 
hitting the 32K limit.

Can I close this as a duplicate of YARN-467? You can verify the fix on 
2.0.5-beta when it is out.

> Local filecache mkdir fails
> ---
>
> Key: YARN-527
> URL: https://issues.apache.org/jira/browse/YARN-527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.0-alpha
> Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes 
> and six worker nodes.
>Reporter: Knut O. Hellan
>Priority: Minor
> Attachments: yarn-site.xml
>
>
> Jobs failed with no other explanation than this stack trace:
> 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag
> nostics report from attempt_1364591875320_0017_m_00_0: 
> java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893
> 55400878397 failed
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
> at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Manually creating the directory worked. This behavior was common to at least 
> several nodes in the cluster.
> The situation was resolved by removing and recreating all 
> /disk?/yarn/local/filecache directories on all nodes.
> It is unclear whether Yarn struggled with the number of files or if there 
> were corrupt files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-03 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
--

Attachment: (was: YARN-412.patch)

> FifoScheduler incorrectly checking for node locality
> 
>
> Key: YARN-412
> URL: https://issues.apache.org/jira/browse/YARN-412
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
>  Labels: patch
> Attachments: YARN-412.patch
>
>
> In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
> data is local to a node by searching for the nodeAddress of the node in the 
> set of outstanding requests for the app.  This seems to be incorrect as it 
> should be checking hostname instead.  The offending line of code is 455:
> application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
> Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
> are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
> In the CapacityScheduler, it's done using hostname.  See 
> LeafQueue.assignNodeLocalContainers, line 1129
> application.getResourceRequest(priority, node.getHostName());
> Note that this bug does not affect the actual scheduling decisions made by 
> the FifoScheduler because even though it incorrect determines that a request 
> is not local to the node, it will still schedule the request immediately 
> because it's rack-local.  However, this bug may be adversely affecting the 
> reporting of job status by underreporting the number of tasks that were node 
> local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-03 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
--

Attachment: YARN-412.patch

> FifoScheduler incorrectly checking for node locality
> 
>
> Key: YARN-412
> URL: https://issues.apache.org/jira/browse/YARN-412
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Roger Hoover
>Assignee: Roger Hoover
>Priority: Minor
>  Labels: patch
> Attachments: YARN-412.patch
>
>
> In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
> data is local to a node by searching for the nodeAddress of the node in the 
> set of outstanding requests for the app.  This seems to be incorrect as it 
> should be checking hostname instead.  The offending line of code is 455:
> application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
> Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
> are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
> In the CapacityScheduler, it's done using hostname.  See 
> LeafQueue.assignNodeLocalContainers, line 1129
> application.getResourceRequest(priority, node.getHostName());
> Note that this bug does not affect the actual scheduling decisions made by 
> the FifoScheduler because even though it incorrect determines that a request 
> is not local to the node, it will still schedule the request immediately 
> because it's rack-local.  However, this bug may be adversely affecting the 
> reporting of job status by underreporting the number of tasks that were node 
> local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-528) Make IDs read only

2013-04-03 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621023#comment-13621023
 ] 

Robert Joseph Evans commented on YARN-528:
--

OK, I understand now.

I will try to find some time to play around with getting the AM ID to not have 
a wrapper at all.

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: YARN-528.txt, YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs

2013-04-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620974#comment-13620974
 ] 

Steve Loughran commented on YARN-535:
-

{code}
org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher 
 Time elapsed: 4137 sec  <<< ERROR!
java.lang.RuntimeException: Error parsing 'yarn-site.xml' : 
org.xml.sax.SAXParseException: Premature end of file.
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2050)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1899)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1816)
at 
org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:465)
at 
org.apache.hadoop.conf.Configuration.asXmlDocument(Configuration.java:2127)
at 
org.apache.hadoop.conf.Configuration.writeXml(Configuration.java:2096)
at 
org.apache.hadoop.conf.Configuration.writeXml(Configuration.java:2086)
at 
org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.setup(TestUnmanagedAMLauncher.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at 
org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
Caused by: org.xml.sax.SAXParseException: Premature end of file.
at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:246)
at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:153)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:1887)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:1875)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1946)
... 29 more
{code}
This stack trace is a failure to read the file yarn-site.xml, which is actually 
being written on line 63 of TestUnmanagedAMLauncher -a file that 
is already open for writing. 

It is possible that some filesystems (here, HFS+) make that write visible while 
it is still
going on, triggering a failure which then corrupts later builds at init time

{code}
$ ls -l target/test-classes/yarn-site.xml 
-rw-r--r--  1 stevel  staff  0  3 Apr 15:37 target/test-classes/yarn-site.xml
{code}

This is newer than the one in test/properties, so Maven doesn't fix it next 
test run
{code}
$ ls -l src/test/resources/yarn-site.xml 
-rw-r--r--@ 1 stevel  staff  830 28 Nov 16:29 src/test/resources/yarn-site.xml
{code}
as a result, follow on tests fail when MiniYARNCluster tries to read it.

{code}
org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher 
 Time elapsed: 515 sec  <<< ERROR!
java.lang.RuntimeException: Error parsing 'yarn-site.xml' : 
org.xml.sax.SAXParseException: Premature end of file.
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2050)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1899)
at 

[jira] [Created] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs

2013-04-03 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-535:
---

 Summary: TestUnmanagedAMLauncher can corrupt 
target/test-classes/yarn-site.xml during write phase, breaks later test runs
 Key: YARN-535
 URL: https://issues.apache.org/jira/browse/YARN-535
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 3.0.0
 Environment: OS/X laptop, HFS+ filesystem
Reporter: Steve Loughran
Priority: Minor


the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. As 
{{Configuration.writeXml()}} does a reread of all resources, this will break if 
the (open-for-writing) resource is already visible as an empty file. 

This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks 
later test runs -because it is not overwritten by later incremental builds, due 
to timestamps.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy

2013-04-03 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-465:
--

Attachment: YARN-465-trunk-a.patch

> fix coverage  org.apache.hadoop.yarn.server.webproxy
> 
>
> Key: YARN-465
> URL: https://issues.apache.org/jira/browse/YARN-465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-465-branch-0.23-a.patch, 
> YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, 
> YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch
>
>
> fix coverage  org.apache.hadoop.yarn.server.webproxy
> patch YARN-465-trunk.patch for trunk
> patch YARN-465-branch-2.patch for branch-2
> patch YARN-465-branch-0.23.patch for branch-0.23
> There is issue in branch-0.23 . Patch does not creating .keep file.
> For fix it need to run commands:
> mkdir 
> yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
> touch 
> yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy

2013-04-03 Thread Aleksey Gorshkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620914#comment-13620914
 ] 

Aleksey Gorshkov commented on YARN-465:
---

patches updated

patch YARN-465-trunk-a.patch for trunk
patch YARN-465-branch-2-a.patch for branch-2
patch YARN-465-branch-0.23-a.patch for branch-0.23

> fix coverage  org.apache.hadoop.yarn.server.webproxy
> 
>
> Key: YARN-465
> URL: https://issues.apache.org/jira/browse/YARN-465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-465-branch-0.23-a.patch, 
> YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, 
> YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch
>
>
> fix coverage  org.apache.hadoop.yarn.server.webproxy
> patch YARN-465-trunk.patch for trunk
> patch YARN-465-branch-2.patch for branch-2
> patch YARN-465-branch-0.23.patch for branch-0.23
> There is issue in branch-0.23 . Patch does not creating .keep file.
> For fix it need to run commands:
> mkdir 
> yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
> touch 
> yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-117) Enhance YARN service model

2013-04-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620913#comment-13620913
 ] 

Steve Loughran commented on YARN-117:
-

I'm not seeing all those tests failing locally, only 
{{TestUnmanagedAMLauncher}} and {{TestNMExpiry}}.

{code}
testNMExpiry(org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry)
  Time elapsed: 2797 sec  <<< FAILURE!
junit.framework.AssertionFailedError: expected:<2> but was:<0>
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:283)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:195)
at junit.framework.Assert.assertEquals(Assert.java:201)
at 
org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry.testNMExpiry(TestNMExpiry.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
{code}

and 
{code}
Running 
org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.579 sec <<< 
FAILURE!
org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher 
 Time elapsed: 579 sec  <<< ERROR!
org.apache.hadoop.yarn.YarnException: could not cleanup test dir: 
java.lang.RuntimeException: Error parsing 'yarn-site.xml' : 
org.xml.sax.SAXParseException: Premature end of file.
at 
org.apache.hadoop.yarn.server.MiniYARNCluster.(MiniYARNCluster.java:95)
at 
org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.setup(TestUnmanagedAMLauncher.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)



> Enhance YARN service model
> --
>
> Key: YARN-117
> URL: https://issues.apache.org/jira/browse/YARN-117
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117.patch
>
>
> Having played the YARN service model, there are some issues
> that I've identified based on past work and initial use.
> This JIRA issue is an overall one to cover the issues, with solutions pushed 
> out to separate JIRAs.
> h2. state model prevents stopped state being entered if you could not 
> successfully start the service.
> In the current lifecycle you cannot stop a service unless it was successfully 
> started, but
> * {{init()}} may acquire resources that need to be explicitly released
> * if the {{start()}} operation fails partway through, the {{stop()}} 
> operation may be needed to release resources.
> *Fix:* make {{stop()}} a valid state transition from all states and require 
> the implementations to be able to stop sa

[jira] [Updated] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy

2013-04-03 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-465:
--

Attachment: YARN-465-branch-2-a.patch
YARN-465-branch-0.23-a.patch

> fix coverage  org.apache.hadoop.yarn.server.webproxy
> 
>
> Key: YARN-465
> URL: https://issues.apache.org/jira/browse/YARN-465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-465-branch-0.23-a.patch, 
> YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, 
> YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch
>
>
> fix coverage  org.apache.hadoop.yarn.server.webproxy
> patch YARN-465-trunk.patch for trunk
> patch YARN-465-branch-2.patch for branch-2
> patch YARN-465-branch-0.23.patch for branch-0.23
> There is issue in branch-0.23 . Patch does not creating .keep file.
> For fix it need to run commands:
> mkdir 
> yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
> touch 
> yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-03 Thread Aleksey Gorshkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620845#comment-13620845
 ] 

Aleksey Gorshkov commented on YARN-427:
---

patch was update
patch YARN-427-trunk-a.patch for trunk
patch YARN-427-branch-2-a.patch for branch-2 and branch-0.23

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, 
> YARN-427-trunk-a.patch, YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-04-03 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-427:
--

Attachment: YARN-427-trunk-a.patch
YARN-427-branch-2-a.patch

> Coverage fix for org.apache.hadoop.yarn.server.api.*
> 
>
> Key: YARN-427
> URL: https://issues.apache.org/jira/browse/YARN-427
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, 
> YARN-427-trunk-a.patch, YARN-427-trunk.patch
>
>
> Coverage fix for org.apache.hadoop.yarn.server.api.*
> patch YARN-427-trunk.patch for trunk
> patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-425) coverage fix for yarn api

2013-04-03 Thread Aleksey Gorshkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620843#comment-13620843
 ] 

Aleksey Gorshkov commented on YARN-425:
---


update patch for trunk :YARN-425-trunk-b.patch
and for branch-2 YARN-425-branch-2-b.patch



> coverage fix for yarn api
> -
>
> Key: YARN-425
> URL: https://issues.apache.org/jira/browse/YARN-425
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, 
> YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, 
> YARN-425-trunk.patch
>
>
> coverage fix for yarn api
> patch YARN-425-trunk-a.patch for trunk
> patch YARN-425-branch-2.patch for branch-2
> patch YARN-425-branch-0.23.patch for branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-425) coverage fix for yarn api

2013-04-03 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-425:
--

Attachment: YARN-425-trunk-b.patch

> coverage fix for yarn api
> -
>
> Key: YARN-425
> URL: https://issues.apache.org/jira/browse/YARN-425
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, 
> YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, 
> YARN-425-trunk.patch
>
>
> coverage fix for yarn api
> patch YARN-425-trunk-a.patch for trunk
> patch YARN-425-branch-2.patch for branch-2
> patch YARN-425-branch-0.23.patch for branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-425) coverage fix for yarn api

2013-04-03 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated YARN-425:
--

Attachment: YARN-425-branch-2-b.patch

> coverage fix for yarn api
> -
>
> Key: YARN-425
> URL: https://issues.apache.org/jira/browse/YARN-425
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, 
> YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, 
> YARN-425-trunk.patch
>
>
> coverage fix for yarn api
> patch YARN-425-trunk-a.patch for trunk
> patch YARN-425-branch-2.patch for branch-2
> patch YARN-425-branch-0.23.patch for branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-527) Local filecache mkdir fails

2013-04-03 Thread Knut O. Hellan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620711#comment-13620711
 ] 

Knut O. Hellan commented on YARN-527:
-

There is really no difference in how the directories are created. What probably 
happened under the hood was that the file system reached maximum number of 
files in the filecache directory. This maximum size is 32000 since we use EXT3. 
I don't have the exact numbers for any of the disks from my checks, but i 
remember seeing above 30k some places. The reason we were able to manually 
create directories might be that there was some automatic cleanup happening. 
Does YARN clean the file cache?

> Local filecache mkdir fails
> ---
>
> Key: YARN-527
> URL: https://issues.apache.org/jira/browse/YARN-527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.0-alpha
> Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes 
> and six worker nodes.
>Reporter: Knut O. Hellan
>Priority: Minor
> Attachments: yarn-site.xml
>
>
> Jobs failed with no other explanation than this stack trace:
> 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag
> nostics report from attempt_1364591875320_0017_m_00_0: 
> java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893
> 55400878397 failed
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
> at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Manually creating the directory worked. This behavior was common to at least 
> several nodes in the cluster.
> The situation was resolved by removing and recreating all 
> /disk?/yarn/local/filecache directories on all nodes.
> It is unclear whether Yarn struggled with the number of files or if there 
> were corrupt files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality

2013-04-03 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620705#comment-13620705
 ] 

Sandy Ryza commented on YARN-392:
-

Ok, I will work on a patch for the non-blacklist proposal. To clarify, should 
location-specific requests be able to coexist with non-location-specific 
requests at the same priority? 


> Make it possible to schedule to specific nodes without dropping locality
> 
>
> Key: YARN-392
> URL: https://issues.apache.org/jira/browse/YARN-392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Sandy Ryza
> Attachments: YARN-392-1.patch, YARN-392.patch
>
>
> Currently its not possible to specify scheduling requests for specific nodes 
> and nowhere else. The RM automatically relaxes locality to rack and * and 
> assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira