[jira] [Updated] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl

2013-04-02 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-457:
-

Attachment: YARN-457-2.patch

Sorry. I changed to call initLocalNewNodeReportList() before clearing 
this.updatedNodes.

> Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
> 
>
> Key: YARN-457
> URL: https://issues.apache.org/jira/browse/YARN-457
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Kenji Kikushima
>Priority: Minor
>  Labels: Newbie
> Attachments: YARN-457-2.patch, YARN-457.patch
>
>
> {code}
> if (updatedNodes == null) {
>   this.updatedNodes.clear();
>   return;
> }
> {code}
> If updatedNodes is already null, a NullPointerException is thrown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620651#comment-13620651
 ] 

Zhijie Shen commented on YARN-193:
--

{quote}
Default value of max-vcores of 32 might be too high.
{quote}

Why 32 is originally used?

In http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/, 
it is said:

2012 – 16+ cores, 48-96GB of RAM, 12x2TB or 12x3TB of disk.

How about we choosing 16?

{quote}
Why is conf being set 2 times for each value? Same for vcores.
{quote}

I'll fix the bug.

> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
> YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, 
> YARN-193.8.patch, YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620638#comment-13620638
 ] 

Bikas Saha commented on YARN-193:
-

Default value of max-vcores of 32 might be too high.

Why is conf being set 2 times for each value? Same for vcores.
{code}
+conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 2048);
+conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 1024);
+try {
+  resourceManager.init(conf);
+  fail("Exception is expected because the min memory allocation is" +
+  " larger than the max memory allocation.");
+} catch (YarnException e) {
+  // Exception is expected.
+}
{code}



> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
> YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, 
> YARN-193.8.patch, YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620617#comment-13620617
 ] 

Hudson commented on YARN-467:
-

Integrated in Hadoop-trunk-Commit #3552 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3552/])
YARN-467. Modify public distributed cache to localize files such that no 
local directory hits unix file count limits and thus prevent job failures. 
Contributed by Omkar Vinit Joshi. (Revision 1463823)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463823
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceRetention.java


> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Fix For: 2.0.5-beta
>
> Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
> yarn-467-20130401.patch, yarn-467-20130402.1.patch, 
> yarn-467-20130402.2.patch, yarn-467-20130402.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:

[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620610#comment-13620610
 ] 

Hadoop QA commented on YARN-101:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576714/YARN-101.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/658//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/658//console

This message is automatically generated.

> If  the heartbeat message loss, the nodestatus info of complete container 
> will loss too.
> 
>
> Key: YARN-101
> URL: https://issues.apache.org/jira/browse/YARN-101
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: suse.
>Reporter: xieguiming
>Assignee: Xuan Gong
>Priority: Minor
> Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, 
> YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch
>
>
> see the red color:
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
>  protected void startStatusUpdater() {
> new Thread("Node Status Updater") {
>   @Override
>   @SuppressWarnings("unchecked")
>   public void run() {
> int lastHeartBeatID = 0;
> while (!isStopped) {
>   // Send heartbeat
>   try {
> synchronized (heartbeatMonitor) {
>   heartbeatMonitor.wait(heartBeatInterval);
> }
> {color:red} 
> // Before we send the heartbeat, we get the NodeStatus,
> // whose method removes completed containers.
> NodeStatus nodeStatus = getNodeStatus();
>  {color}
> nodeStatus.setResponseId(lastHeartBeatID);
> 
> NodeHeartbeatRequest request = recordFactory
> .newRecordInstance(NodeHeartbeatRequest.class);
> request.setNodeStatus(nodeStatus);   
> {color:red} 
>// But if the nodeHeartbeat fails, we've already removed the 
> containers away to know about it. We aren't handling a nodeHeartbeat failure 
> case here.
> HeartbeatResponse response =
>   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
>{color} 
> if (response.getNodeAction() == NodeAction.SHUTDOWN) {
>   LOG
>   .info("Recieved SHUTDOWN signal from Resourcemanager as 
> part of heartbeat," +
>   " hence shutting down.");
>   NodeStatusUpdaterImpl.this.stop();
>   break;
> }
> if (response.getNodeAction() == NodeAction.REBOOT) {
>   LOG.info("Node is out of sync with ResourceManager,"
>   + " hence rebooting.");
>   NodeStatusUpdaterImpl.this.reboot();
>   break;
> }
> lastHeartBeatID = response.getResponseId();
> List containersToCleanup = response
> .getContainersToCleanupList();
> if (containersToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedContainersEvent(containersToCleanup));
> }
> List appsToCleanup =
> response.getApplicationsToCleanupList();
> //Only start tracking for keepAlive on FINISH_APP
> trackAppsForKeepAlive(appsToCleanup);
> if (appsToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedAppsEvent(appsToCleanup));
> }
>   } catch (Throwable e) {
> // TODO Better erro

[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2013-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620608#comment-13620608
 ] 

Vinod Kumar Vavilapalli commented on YARN-467:
--

Perfect, the latest patch looks good. Checking it in.

> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
> yarn-467-20130401.patch, yarn-467-20130402.1.patch, 
> yarn-467-20130402.2.patch, yarn-467-20130402.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-04-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-101:
---

Attachment: YARN-101.6.patch

recreate test case to verify status of all containers in every heartbeat

> If  the heartbeat message loss, the nodestatus info of complete container 
> will loss too.
> 
>
> Key: YARN-101
> URL: https://issues.apache.org/jira/browse/YARN-101
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: suse.
>Reporter: xieguiming
>Assignee: Xuan Gong
>Priority: Minor
> Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, 
> YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch
>
>
> see the red color:
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
>  protected void startStatusUpdater() {
> new Thread("Node Status Updater") {
>   @Override
>   @SuppressWarnings("unchecked")
>   public void run() {
> int lastHeartBeatID = 0;
> while (!isStopped) {
>   // Send heartbeat
>   try {
> synchronized (heartbeatMonitor) {
>   heartbeatMonitor.wait(heartBeatInterval);
> }
> {color:red} 
> // Before we send the heartbeat, we get the NodeStatus,
> // whose method removes completed containers.
> NodeStatus nodeStatus = getNodeStatus();
>  {color}
> nodeStatus.setResponseId(lastHeartBeatID);
> 
> NodeHeartbeatRequest request = recordFactory
> .newRecordInstance(NodeHeartbeatRequest.class);
> request.setNodeStatus(nodeStatus);   
> {color:red} 
>// But if the nodeHeartbeat fails, we've already removed the 
> containers away to know about it. We aren't handling a nodeHeartbeat failure 
> case here.
> HeartbeatResponse response =
>   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
>{color} 
> if (response.getNodeAction() == NodeAction.SHUTDOWN) {
>   LOG
>   .info("Recieved SHUTDOWN signal from Resourcemanager as 
> part of heartbeat," +
>   " hence shutting down.");
>   NodeStatusUpdaterImpl.this.stop();
>   break;
> }
> if (response.getNodeAction() == NodeAction.REBOOT) {
>   LOG.info("Node is out of sync with ResourceManager,"
>   + " hence rebooting.");
>   NodeStatusUpdaterImpl.this.reboot();
>   break;
> }
> lastHeartBeatID = response.getResponseId();
> List containersToCleanup = response
> .getContainersToCleanupList();
> if (containersToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedContainersEvent(containersToCleanup));
> }
> List appsToCleanup =
> response.getApplicationsToCleanupList();
> //Only start tracking for keepAlive on FINISH_APP
> trackAppsForKeepAlive(appsToCleanup);
> if (appsToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedAppsEvent(appsToCleanup));
> }
>   } catch (Throwable e) {
> // TODO Better error handling. Thread can die with the rest of the
> // NM still running.
> LOG.error("Caught exception in status-updater", e);
>   }
> }
>   }
> }.start();
>   }
>   private NodeStatus getNodeStatus() {
> NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
> nodeStatus.setNodeId(this.nodeId);
> int numActiveContainers = 0;
> List containersStatuses = new 
> ArrayList();
> for (Iterator> i =
> this.context.getContainers().entrySet().iterator(); i.hasNext();) {
>   Entry e = i.next();
>   ContainerId containerId = e.getKey();
>   Container container = e.getValue();
>   // Clone the container to send it to the RM
>   org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 
>   container.cloneAndGetContainerStatus();
>   containersStatuses.add(containerStatus);
>   ++numActiveContainers;
>   LOG.info("Sending out status for container: " + containerStatus);
>   {color:red} 
>   // Here is the part that removes the completed containers.
>   if (containerStatus.getState() == ContainerState.COMPLETE) {
> // Remove
> i.remove();
>   {color} 
> LOG.info("Removed completed container " + containerId);
>   }
> 

[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620546#comment-13620546
 ] 

Hadoop QA commented on YARN-467:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576705/yarn-467-20130402.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/657//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/657//console

This message is automatically generated.

> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
> yarn-467-20130401.patch, yarn-467-20130402.1.patch, 
> yarn-467-20130402.2.patch, yarn-467-20130402.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2013-04-02 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-467:
---

Attachment: yarn-467-20130402.2.patch

> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
> yarn-467-20130401.patch, yarn-467-20130402.1.patch, 
> yarn-467-20130402.2.patch, yarn-467-20130402.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2013-04-02 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620535#comment-13620535
 ] 

Omkar Vinit Joshi commented on YARN-467:


I have tested this code for below scenarios
* I used 4 local-dirs to see if the localization gets distributed across them 
and LocalCacheDirectoryManager 
is managing them separately
* I tested for various values of 
"yarn.nodemanager.local-cache.max-files-per-directory" <=36, 37 , 40 and much 
larger..
* I modified the cache cleanup interval and cache target size in mb to see 
older files getting removed from cache and LocalCacheDirectoryManager's sub 
directories are getting reused.
* I tested that we never run into a situation where we have more number of 
files or sub directories in any local-directory than what is specified in the 
configuration.

> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
> yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620501#comment-13620501
 ] 

Hadoop QA commented on YARN-458:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576699/YARN-458.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/656//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/656//console

This message is automatically generated.

> YARN daemon addresses must be placed in many different configs
> --
>
> Key: YARN-458
> URL: https://issues.apache.org/jira/browse/YARN-458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-458.patch
>
>
> The YARN resourcemanager's address is included in four different configs: 
> yarn.resourcemanager.scheduler.address, 
> yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
> and yarn.resourcemanager.admin.address
> A new user trying to configure a cluster needs to know the names of all these 
> four configs.
> The same issue exists for nodemanagers.
> It would be much easier if they could simply specify 
> yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
> for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs

2013-04-02 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-458:


Component/s: resourcemanager
 nodemanager

> YARN daemon addresses must be placed in many different configs
> --
>
> Key: YARN-458
> URL: https://issues.apache.org/jira/browse/YARN-458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-458.patch
>
>
> The YARN resourcemanager's address is included in four different configs: 
> yarn.resourcemanager.scheduler.address, 
> yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
> and yarn.resourcemanager.admin.address
> A new user trying to configure a cluster needs to know the names of all these 
> four configs.
> The same issue exists for nodemanagers.
> It would be much easier if they could simply specify 
> yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
> for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs

2013-04-02 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-458:


Affects Version/s: 2.0.3-alpha

> YARN daemon addresses must be placed in many different configs
> --
>
> Key: YARN-458
> URL: https://issues.apache.org/jira/browse/YARN-458
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-458.patch
>
>
> The YARN resourcemanager's address is included in four different configs: 
> yarn.resourcemanager.scheduler.address, 
> yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
> and yarn.resourcemanager.admin.address
> A new user trying to configure a cluster needs to know the names of all these 
> four configs.
> The same issue exists for nodemanagers.
> It would be much easier if they could simply specify 
> yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
> for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs

2013-04-02 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-458:


Description: 
The YARN resourcemanager's address is included in four different configs: 
yarn.resourcemanager.scheduler.address, 
yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
and yarn.resourcemanager.admin.address

A new user trying to configure a cluster needs to know the names of all these 
four configs.

The same issue exists for nodemanagers.

It would be much easier if they could simply specify 
yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
for the other ones would kick in.

  was:
The YARN resourcemanager's address is included in four different configs: 
yarn.resourcemanager.scheduler.address, 
yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
and yarn.resourcemanager.admin.address

A new user trying to configure a cluster needs to know the names of all these 
four configs.

It would be much easier if they could simply specify 
yarn.resourcemanager.address and default ports for the other ones would kick in.


> YARN daemon addresses must be placed in many different configs
> --
>
> Key: YARN-458
> URL: https://issues.apache.org/jira/browse/YARN-458
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-458.patch
>
>
> The YARN resourcemanager's address is included in four different configs: 
> yarn.resourcemanager.scheduler.address, 
> yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
> and yarn.resourcemanager.admin.address
> A new user trying to configure a cluster needs to know the names of all these 
> four configs.
> The same issue exists for nodemanagers.
> It would be much easier if they could simply specify 
> yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
> for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-458) Resource manager address must be placed in four different configs

2013-04-02 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620489#comment-13620489
 ] 

Sandy Ryza commented on YARN-458:
-

Uploaded a patch that adds yarn.resourcemanager.hostname and 
yarn.nodemanager.hostname properties, and changes all the other configs to use 
${yarn.resourcemanager.address} and ${yarn.nodemanager.address).



> Resource manager address must be placed in four different configs
> -
>
> Key: YARN-458
> URL: https://issues.apache.org/jira/browse/YARN-458
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-458.patch
>
>
> The YARN resourcemanager's address is included in four different configs: 
> yarn.resourcemanager.scheduler.address, 
> yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
> and yarn.resourcemanager.admin.address
> A new user trying to configure a cluster needs to know the names of all these 
> four configs.
> It would be much easier if they could simply specify 
> yarn.resourcemanager.address and default ports for the other ones would kick 
> in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs

2013-04-02 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-458:


Summary: YARN daemon addresses must be placed in many different configs  
(was: Resource manager address must be placed in four different configs)

> YARN daemon addresses must be placed in many different configs
> --
>
> Key: YARN-458
> URL: https://issues.apache.org/jira/browse/YARN-458
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-458.patch
>
>
> The YARN resourcemanager's address is included in four different configs: 
> yarn.resourcemanager.scheduler.address, 
> yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
> and yarn.resourcemanager.admin.address
> A new user trying to configure a cluster needs to know the names of all these 
> four configs.
> It would be much easier if they could simply specify 
> yarn.resourcemanager.address and default ports for the other ones would kick 
> in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-458) Resource manager address must be placed in four different configs

2013-04-02 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-458:


Attachment: YARN-458.patch

> Resource manager address must be placed in four different configs
> -
>
> Key: YARN-458
> URL: https://issues.apache.org/jira/browse/YARN-458
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-458.patch
>
>
> The YARN resourcemanager's address is included in four different configs: 
> yarn.resourcemanager.scheduler.address, 
> yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
> and yarn.resourcemanager.admin.address
> A new user trying to configure a cluster needs to know the names of all these 
> four configs.
> It would be much easier if they could simply specify 
> yarn.resourcemanager.address and default ports for the other ones would kick 
> in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-534) AM max attempts is not checked when RM restart and try to recover attempts

2013-04-02 Thread Jian He (JIRA)
Jian He created YARN-534:


 Summary: AM max attempts is not checked when RM restart and try to 
recover attempts
 Key: YARN-534
 URL: https://issues.apache.org/jira/browse/YARN-534
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He


Currently,AM max attempts is only checked if the current attempt fails and 
check to see whether to create new attempt. If the RM restarts before the 
max-attempt fails, it'll not clean the state store, when RM comes back, it will 
retry attempt again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-495) Containers are not terminated when the NM is rebooted

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620448#comment-13620448
 ] 

Hadoop QA commented on YARN-495:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576695/YARN-495.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/655//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/655//console

This message is automatically generated.

> Containers are not terminated when the NM is rebooted
> -
>
> Key: YARN-495
> URL: https://issues.apache.org/jira/browse/YARN-495
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-495.1.patch, YARN-495.2.patch
>
>
> When a reboot command is sent from RM, the node manager doesn't clean up the 
> containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-495) Containers are not terminated when the NM is rebooted

2013-04-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620443#comment-13620443
 ] 

Jian He commented on YARN-495:
--

Uploaded a patch, change NM behavior from REBOOT to RESYNC when the RM restarted

> Containers are not terminated when the NM is rebooted
> -
>
> Key: YARN-495
> URL: https://issues.apache.org/jira/browse/YARN-495
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-495.1.patch, YARN-495.2.patch
>
>
> When a reboot command is sent from RM, the node manager doesn't clean up the 
> containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-495) Containers are not terminated when the NM is rebooted

2013-04-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-495:
-

Attachment: YARN-495.2.patch

> Containers are not terminated when the NM is rebooted
> -
>
> Key: YARN-495
> URL: https://issues.apache.org/jira/browse/YARN-495
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-495.1.patch, YARN-495.2.patch
>
>
> When a reboot command is sent from RM, the node manager doesn't clean up the 
> containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-533) Pointing to the config property when throwing/logging the config-related exception

2013-04-02 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-533:


 Summary: Pointing to the config property when throwing/logging the 
config-related exception
 Key: YARN-533
 URL: https://issues.apache.org/jira/browse/YARN-533
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


When throwing/logging errors related to configiguration, we should always point 
to the configuration property to let users know which property needs to be 
changed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620412#comment-13620412
 ] 

Hadoop QA commented on YARN-467:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576688/yarn-467-20130402.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/654//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/654//console

This message is automatically generated.

> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
> yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2013-04-02 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-467:
---

Attachment: yarn-467-20130402.1.patch

fixing test issue... that check is no longer valid.

> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
> yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620385#comment-13620385
 ] 

Hadoop QA commented on YARN-193:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576680/YARN-193.12.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/653//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/653//console

This message is automatically generated.

> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
> YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, 
> YARN-193.8.patch, YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620384#comment-13620384
 ] 

Hadoop QA commented on YARN-467:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576681/yarn-467-20130402.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/652//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/652//console

This message is automatically generated.

> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
> yarn-467-20130401.patch, yarn-467-20130402.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits

2013-04-02 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-467:
---

Attachment: yarn-467-20130402.patch

Fixing below issues
1) all the formatting issues 
2) adding one additional test case for checking Directory state transition from 
FULL->NON_FULL->FULL
3) javadoc warnings

> Jobs fail during resource localization when public distributed-cache hits 
> unix directory limits
> ---
>
> Key: YARN-467
> URL: https://issues.apache.org/jira/browse/YARN-467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.0.0-alpha
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
> Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
> yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
> yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
> yarn-467-20130401.patch, yarn-467-20130402.patch
>
>
> If we have multiple jobs which uses distributed cache with small size of 
> files, the directory limit reaches before reaching the cache size and fails 
> to create any directories in file cache (PUBLIC). The jobs start failing with 
> the below exception.
> java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
> failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
>   at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> we need to have a mechanism where in we can create directory hierarchy and 
> limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-193:
-

Attachment: YARN-193.12.patch

1. Remove the DISABLE_RESOURCELIMIT_CHECK feature, and its related test cases.

2. Rewrite the log messages, and output them through LOG.warn.

3. Add javadocs for InvalidResourceRequestException.

4. Check whether thrown exception is InvalidResourceRequestException in 
TestClientRMService.

5. Add the test case of ask > max in TestSchedulerUtils.

6. Fixed other minor issues commented by Bikas and Hitesh (e.g., typo, 
unnecessary import).

7. Rebase with YARN-382.


> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
> YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, 
> YARN-193.8.patch, YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620362#comment-13620362
 ] 

Hadoop QA commented on YARN-532:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576674/YARN-532.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/651//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/651//console

This message is automatically generated.

> RMAdminProtocolPBClientImpl should implement Closeable
> --
>
> Key: YARN-532
> URL: https://issues.apache.org/jira/browse/YARN-532
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: YARN-532.txt
>
>
> Required for RPC.stopProxy to work. Already done in most of the other 
> protocols. (MAPREDUCE-5117 addressing the one other protocol missing this)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-528) Make IDs read only

2013-04-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620348#comment-13620348
 ] 

Siddharth Seth commented on YARN-528:
-

bq. I really don't understand how this is supposed to work. How do we create 
fewer objects by wrapping them in more objects? I can see us doing something 
like deduping the objects that come over the wire, but I don't see how wrapping 
works here. 
Not compared to using Protos directly (which wasn't really an option), but 
compared to an alternate of converting only for the RPC layer.

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: YARN-528.txt, YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable

2013-04-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-532:


Attachment: YARN-532.txt

Trivial fix.

> RMAdminProtocolPBClientImpl should implement Closeable
> --
>
> Key: YARN-532
> URL: https://issues.apache.org/jira/browse/YARN-532
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: YARN-532.txt
>
>
> Required for RPC.stopProxy to work. Already done in most of the other 
> protocols. (MAPREDUCE-5117 addressing the one other protocol missing this)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable

2013-04-02 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-532:
---

 Summary: RMAdminProtocolPBClientImpl should implement Closeable
 Key: YARN-532
 URL: https://issues.apache.org/jira/browse/YARN-532
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth


Required for RPC.stopProxy to work. Already done in most of the other 
protocols. (MAPREDUCE-5117 addressing the one other protocol missing this)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-528) Make IDs read only

2013-04-02 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620326#comment-13620326
 ] 

Robert Joseph Evans commented on YARN-528:
--

I am fine with splitting the MR changes from the YARN change like I said, I put 
this out here more to be a question of how do we want to go about implementing 
theses changes, and the test was more of a prototype example.

I personally lean more towards using the *Proto classes directly.  Why have 
something else wrapping it if we don't need it, even if it is a small and 
simple layer.  The only reason I did not go that route here is because of 
toString().  With the IDs we rely on having ID.toString() turn into something 
very specific that can be parsed and turned back into an instance of the 
object.  If I had the time I would trace down all places where we call toString 
on them and replace it with something else. I may just scale back the scope of 
the patch to look at ApplicationID to begin with and try to see if I can 
accomplish this.

bq. Wrapping the object which came over the wire - with a goal of creating 
fewer objects.

I really don't understand how this is supposed to work.  How do we create fewer 
objects by wrapping them in more objects? I can see us doing something like 
deduping the objects that come over the wire, but I don't see how wrapping 
works here.  

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: YARN-528.txt, YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620291#comment-13620291
 ] 

Hadoop QA commented on YARN-479:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576654/YARN-479.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/650//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/650//console

This message is automatically generated.

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620284#comment-13620284
 ] 

Hadoop QA commented on YARN-101:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576650/YARN-101.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/649//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/649//console

This message is automatically generated.

> If  the heartbeat message loss, the nodestatus info of complete container 
> will loss too.
> 
>
> Key: YARN-101
> URL: https://issues.apache.org/jira/browse/YARN-101
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: suse.
>Reporter: xieguiming
>Assignee: Xuan Gong
>Priority: Minor
> Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, 
> YARN-101.4.patch, YARN-101.5.patch
>
>
> see the red color:
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
>  protected void startStatusUpdater() {
> new Thread("Node Status Updater") {
>   @Override
>   @SuppressWarnings("unchecked")
>   public void run() {
> int lastHeartBeatID = 0;
> while (!isStopped) {
>   // Send heartbeat
>   try {
> synchronized (heartbeatMonitor) {
>   heartbeatMonitor.wait(heartBeatInterval);
> }
> {color:red} 
> // Before we send the heartbeat, we get the NodeStatus,
> // whose method removes completed containers.
> NodeStatus nodeStatus = getNodeStatus();
>  {color}
> nodeStatus.setResponseId(lastHeartBeatID);
> 
> NodeHeartbeatRequest request = recordFactory
> .newRecordInstance(NodeHeartbeatRequest.class);
> request.setNodeStatus(nodeStatus);   
> {color:red} 
>// But if the nodeHeartbeat fails, we've already removed the 
> containers away to know about it. We aren't handling a nodeHeartbeat failure 
> case here.
> HeartbeatResponse response =
>   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
>{color} 
> if (response.getNodeAction() == NodeAction.SHUTDOWN) {
>   LOG
>   .info("Recieved SHUTDOWN signal from Resourcemanager as 
> part of heartbeat," +
>   " hence shutting down.");
>   NodeStatusUpdaterImpl.this.stop();
>   break;
> }
> if (response.getNodeAction() == NodeAction.REBOOT) {
>   LOG.info("Node is out of sync with ResourceManager,"
>   + " hence rebooting.");
>   NodeStatusUpdaterImpl.this.reboot();
>   break;
> }
> lastHeartBeatID = response.getResponseId();
> List containersToCleanup = response
> .getContainersToCleanupList();
> if (containersToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedContainersEvent(containersToCleanup));
> }
> List appsToCleanup =
> response.getApplicationsToCleanupList();
> //Only start tracking for keepAlive on FINISH_APP
> trackAppsForKeepAlive(appsToCleanup);
> if (appsToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedAppsEvent(appsToCleanup));
> }
>   } catch (Throwable e) {
> // TODO Better error handling. Thread

[jira] [Updated] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats

2013-04-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-479:
-

Attachment: YARN-479.5.patch

> NM retry behavior for connection to RM should be similar for lost heartbeats
> 
>
> Key: YARN-479
> URL: https://issues.apache.org/jira/browse/YARN-479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jian He
> Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
> YARN-479.4.patch, YARN-479.5.patch
>
>
> Regardless of connection loss at the start or at an intermediate point, NM's 
> retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-528) Make IDs read only

2013-04-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620275#comment-13620275
 ] 

Siddharth Seth commented on YARN-528:
-

Yep, we'll likely only support a single serialization, which at this point is 
PB.
What the current approach was supposed to be good at.
1. Handling unknown fields (which proto already supports), which could make 
rolling upgrades etc easier.
2. Wrapping the object which came over the wire - with a goal of creating fewer 
objects.

I don't think the second point was really achieved, with the implementation 
getting complicated because of the interfaces being mutable, lists and 
supporting chained sets (clc.getResource().setMemory()). I think point one 
should continue to be maintained.

Do we want *Proto references in the APIs (client library versus Java Protocol 
definition) . At the moment, these are only referenced in the PBImpls - and 
hidden by the abstraction layer.

What I don't like about the patch is Protos leaking into the object 
constructors. Instead, I think we could just use simple Java objects, with 
conversion at the RPC layer (I believe this is similar to the HDFS model). 
Unknown fields can be handled via byte[] arrays.
I'm guessing very few of the interfaces actually need to be mutable - so in 
that sense, yes this needs to be done before beta. OTOH, changing the PBImpl 
itself can be done at a  later point if required. (Earlier is of-course better, 
and I'd be happy to help with this. Was planning on working on YARN-442 before 
you started this work). 

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: YARN-528.txt, YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-486:
--

Assignee: Xuan Gong  (was: Bikas Saha)

> Change startContainer NM API to accept Container as a parameter and make 
> ContainerLaunchContext user land
> -
>
> Key: YARN-486
> URL: https://issues.apache.org/jira/browse/YARN-486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Xuan Gong
>
> Currently, id, resource request etc need to be copied over from Container to 
> ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
> information (such as Resource from CLC and Resource from Container and 
> Container.tokens). Sending Container directly to startContainer solves these 
> problems. It also makes CLC clean by only having stuff in it that it set by 
> the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-04-02 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620261#comment-13620261
 ] 

Xuan Gong commented on YARN-101:


1.Use YarnServerBuilderUtils for constructing node-heartbeat response
2.User BuilderUtils to create ApplicationId, ContainerId, ContainerStatus, etc
3.Recreated the test case as last comment suggested

> If  the heartbeat message loss, the nodestatus info of complete container 
> will loss too.
> 
>
> Key: YARN-101
> URL: https://issues.apache.org/jira/browse/YARN-101
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: suse.
>Reporter: xieguiming
>Assignee: Xuan Gong
>Priority: Minor
> Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, 
> YARN-101.4.patch, YARN-101.5.patch
>
>
> see the red color:
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
>  protected void startStatusUpdater() {
> new Thread("Node Status Updater") {
>   @Override
>   @SuppressWarnings("unchecked")
>   public void run() {
> int lastHeartBeatID = 0;
> while (!isStopped) {
>   // Send heartbeat
>   try {
> synchronized (heartbeatMonitor) {
>   heartbeatMonitor.wait(heartBeatInterval);
> }
> {color:red} 
> // Before we send the heartbeat, we get the NodeStatus,
> // whose method removes completed containers.
> NodeStatus nodeStatus = getNodeStatus();
>  {color}
> nodeStatus.setResponseId(lastHeartBeatID);
> 
> NodeHeartbeatRequest request = recordFactory
> .newRecordInstance(NodeHeartbeatRequest.class);
> request.setNodeStatus(nodeStatus);   
> {color:red} 
>// But if the nodeHeartbeat fails, we've already removed the 
> containers away to know about it. We aren't handling a nodeHeartbeat failure 
> case here.
> HeartbeatResponse response =
>   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
>{color} 
> if (response.getNodeAction() == NodeAction.SHUTDOWN) {
>   LOG
>   .info("Recieved SHUTDOWN signal from Resourcemanager as 
> part of heartbeat," +
>   " hence shutting down.");
>   NodeStatusUpdaterImpl.this.stop();
>   break;
> }
> if (response.getNodeAction() == NodeAction.REBOOT) {
>   LOG.info("Node is out of sync with ResourceManager,"
>   + " hence rebooting.");
>   NodeStatusUpdaterImpl.this.reboot();
>   break;
> }
> lastHeartBeatID = response.getResponseId();
> List containersToCleanup = response
> .getContainersToCleanupList();
> if (containersToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedContainersEvent(containersToCleanup));
> }
> List appsToCleanup =
> response.getApplicationsToCleanupList();
> //Only start tracking for keepAlive on FINISH_APP
> trackAppsForKeepAlive(appsToCleanup);
> if (appsToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedAppsEvent(appsToCleanup));
> }
>   } catch (Throwable e) {
> // TODO Better error handling. Thread can die with the rest of the
> // NM still running.
> LOG.error("Caught exception in status-updater", e);
>   }
> }
>   }
> }.start();
>   }
>   private NodeStatus getNodeStatus() {
> NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
> nodeStatus.setNodeId(this.nodeId);
> int numActiveContainers = 0;
> List containersStatuses = new 
> ArrayList();
> for (Iterator> i =
> this.context.getContainers().entrySet().iterator(); i.hasNext();) {
>   Entry e = i.next();
>   ContainerId containerId = e.getKey();
>   Container container = e.getValue();
>   // Clone the container to send it to the RM
>   org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 
>   container.cloneAndGetContainerStatus();
>   containersStatuses.add(containerStatus);
>   ++numActiveContainers;
>   LOG.info("Sending out status for container: " + containerStatus);
>   {color:red} 
>   // Here is the part that removes the completed containers.
>   if (containerStatus.getState() == ContainerState.COMPLETE) {
>   

[jira] [Updated] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-04-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-101:
---

Attachment: YARN-101.5.patch

> If  the heartbeat message loss, the nodestatus info of complete container 
> will loss too.
> 
>
> Key: YARN-101
> URL: https://issues.apache.org/jira/browse/YARN-101
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: suse.
>Reporter: xieguiming
>Assignee: Xuan Gong
>Priority: Minor
> Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, 
> YARN-101.4.patch, YARN-101.5.patch
>
>
> see the red color:
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
>  protected void startStatusUpdater() {
> new Thread("Node Status Updater") {
>   @Override
>   @SuppressWarnings("unchecked")
>   public void run() {
> int lastHeartBeatID = 0;
> while (!isStopped) {
>   // Send heartbeat
>   try {
> synchronized (heartbeatMonitor) {
>   heartbeatMonitor.wait(heartBeatInterval);
> }
> {color:red} 
> // Before we send the heartbeat, we get the NodeStatus,
> // whose method removes completed containers.
> NodeStatus nodeStatus = getNodeStatus();
>  {color}
> nodeStatus.setResponseId(lastHeartBeatID);
> 
> NodeHeartbeatRequest request = recordFactory
> .newRecordInstance(NodeHeartbeatRequest.class);
> request.setNodeStatus(nodeStatus);   
> {color:red} 
>// But if the nodeHeartbeat fails, we've already removed the 
> containers away to know about it. We aren't handling a nodeHeartbeat failure 
> case here.
> HeartbeatResponse response =
>   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
>{color} 
> if (response.getNodeAction() == NodeAction.SHUTDOWN) {
>   LOG
>   .info("Recieved SHUTDOWN signal from Resourcemanager as 
> part of heartbeat," +
>   " hence shutting down.");
>   NodeStatusUpdaterImpl.this.stop();
>   break;
> }
> if (response.getNodeAction() == NodeAction.REBOOT) {
>   LOG.info("Node is out of sync with ResourceManager,"
>   + " hence rebooting.");
>   NodeStatusUpdaterImpl.this.reboot();
>   break;
> }
> lastHeartBeatID = response.getResponseId();
> List containersToCleanup = response
> .getContainersToCleanupList();
> if (containersToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedContainersEvent(containersToCleanup));
> }
> List appsToCleanup =
> response.getApplicationsToCleanupList();
> //Only start tracking for keepAlive on FINISH_APP
> trackAppsForKeepAlive(appsToCleanup);
> if (appsToCleanup.size() != 0) {
>   dispatcher.getEventHandler().handle(
>   new CMgrCompletedAppsEvent(appsToCleanup));
> }
>   } catch (Throwable e) {
> // TODO Better error handling. Thread can die with the rest of the
> // NM still running.
> LOG.error("Caught exception in status-updater", e);
>   }
> }
>   }
> }.start();
>   }
>   private NodeStatus getNodeStatus() {
> NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
> nodeStatus.setNodeId(this.nodeId);
> int numActiveContainers = 0;
> List containersStatuses = new 
> ArrayList();
> for (Iterator> i =
> this.context.getContainers().entrySet().iterator(); i.hasNext();) {
>   Entry e = i.next();
>   ContainerId containerId = e.getKey();
>   Container container = e.getValue();
>   // Clone the container to send it to the RM
>   org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 
>   container.cloneAndGetContainerStatus();
>   containersStatuses.add(containerStatus);
>   ++numActiveContainers;
>   LOG.info("Sending out status for container: " + containerStatus);
>   {color:red} 
>   // Here is the part that removes the completed containers.
>   if (containerStatus.getState() == ContainerState.COMPLETE) {
> // Remove
> i.remove();
>   {color} 
> LOG.info("Removed completed container " + containerId);
>   }
> }
> nodeStatus.setContainersStatuses(containersStatuses);
> LOG.debug(this.nodeId + " sendin

[jira] [Commented] (YARN-527) Local filecache mkdir fails

2013-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620239#comment-13620239
 ] 

Vinod Kumar Vavilapalli commented on YARN-527:
--

Is there any difference in how NodeManager tried to create the dir and your 
manual creation? Like the user running NM and user who manually created the 
dir? Can you reproduce this? If we can find out exactly why NM couldn't create 
it automatically, then we can do something about it.

> Local filecache mkdir fails
> ---
>
> Key: YARN-527
> URL: https://issues.apache.org/jira/browse/YARN-527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.0-alpha
> Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes 
> and six worker nodes.
>Reporter: Knut O. Hellan
>Priority: Minor
> Attachments: yarn-site.xml
>
>
> Jobs failed with no other explanation than this stack trace:
> 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag
> nostics report from attempt_1364591875320_0017_m_00_0: 
> java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893
> 55400878397 failed
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
> at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Manually creating the directory worked. This behavior was common to at least 
> several nodes in the cluster.
> The situation was resolved by removing and recreating all 
> /disk?/yarn/local/filecache directories on all nodes.
> It is unclear whether Yarn struggled with the number of files or if there 
> were corrupt files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-529) Succeeded MR job is retried by RM if finishApplicationMaster() call fails

2013-04-02 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-529:


Issue Type: Improvement  (was: Sub-task)
Parent: (was: YARN-128)

> Succeeded MR job is retried by RM if finishApplicationMaster() call fails
> -
>
> Key: YARN-529
> URL: https://issues.apache.org/jira/browse/YARN-529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> MR app master will clean staging dir, if the job is already succeeded and 
> asked to reboot. If the finishApplicationMaster call fails, RM will consider 
> this job unfinished and launch further attempts, further attempts will fail 
> because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-529) Succeeded MR job is retried by RM if finishApplicationMaster() call fails

2013-04-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620235#comment-13620235
 ] 

Bikas Saha commented on YARN-529:
-

This problem is related to RM Restart but independent of it. Even without 
restart, if for some reason, during MR app master shutdown, if unregister from 
RM fails, then the app master will continue and delete staging dir etc. Since 
RM did not get an unregister, it will retry the MR app and all subsequent 
attempts will fail.

> Succeeded MR job is retried by RM if finishApplicationMaster() call fails
> -
>
> Key: YARN-529
> URL: https://issues.apache.org/jira/browse/YARN-529
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> MR app master will clean staging dir, if the job is already succeeded and 
> asked to reboot. If the finishApplicationMaster call fails, RM will consider 
> this job unfinished and launch further attempts, further attempts will fail 
> because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-529) Succeeded MR job is retried by RM if finishApplicationMaster() call fails

2013-04-02 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-529:


Summary: Succeeded MR job is retried by RM if finishApplicationMaster() 
call fails  (was: Succeeded RM job is retried by RM if 
finishApplicationMaster() call fails)

> Succeeded MR job is retried by RM if finishApplicationMaster() call fails
> -
>
> Key: YARN-529
> URL: https://issues.apache.org/jira/browse/YARN-529
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> MR app master will clean staging dir, if the job is already succeeded and 
> asked to reboot. If the finishApplicationMaster call fails, RM will consider 
> this job unfinished and launch further attempts, further attempts will fail 
> because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-529) Succeeded RM job is retried by RM if finishApplicationMaster() call fails

2013-04-02 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-529:


Summary: Succeeded RM job is retried by RM if finishApplicationMaster() 
call fails  (was: MR app master clean staging dir when reboot command sent from 
RM while the MR job succeeded)

> Succeeded RM job is retried by RM if finishApplicationMaster() call fails
> -
>
> Key: YARN-529
> URL: https://issues.apache.org/jira/browse/YARN-529
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> MR app master will clean staging dir, if the job is already succeeded and 
> asked to reboot. If the finishApplicationMaster call fails, RM will consider 
> this job unfinished and launch further attempts, further attempts will fail 
> because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded

2013-04-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620230#comment-13620230
 ] 

Bikas Saha commented on YARN-529:
-

By 1) you mean let RM accept finishApplicationAttempt() from the last attempt?

> MR app master clean staging dir when reboot command sent from RM while the MR 
> job succeeded
> ---
>
> Key: YARN-529
> URL: https://issues.apache.org/jira/browse/YARN-529
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> MR app master will clean staging dir, if the job is already succeeded and 
> asked to reboot. If the finishApplicationMaster call fails, RM will consider 
> this job unfinished and launch further attempts, further attempts will fail 
> because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded

2013-04-02 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-529:


Description: MR app master will clean staging dir, if the job is already 
succeeded and asked to reboot. If the finishApplicationMaster call fails, RM 
will consider this job unfinished and launch further attempts, further attempts 
will fail because staging dir is cleaned  (was: MR app master will clean 
staging dir, if the job is already succeeded and asked to reboot. RM will 
consider this job unsuccessful and launch further attempts, further attempts 
will fail because staging dir is cleaned)

> MR app master clean staging dir when reboot command sent from RM while the MR 
> job succeeded
> ---
>
> Key: YARN-529
> URL: https://issues.apache.org/jira/browse/YARN-529
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
>
> MR app master will clean staging dir, if the job is already succeeded and 
> asked to reboot. If the finishApplicationMaster call fails, RM will consider 
> this job unfinished and launch further attempts, further attempts will fail 
> because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-531) RM nodes page should show time-since-last-heartbeat instead of absolute last-heartbeat time

2013-04-02 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-531:


 Summary: RM nodes page should show time-since-last-heartbeat 
instead of absolute last-heartbeat time
 Key: YARN-531
 URL: https://issues.apache.org/jira/browse/YARN-531
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Absolute last-heartbeat time is absolutely useless ;) We need to replace it 
with time since last heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-531) RM nodes page should show time-since-last-heartbeat instead of absolute last-heartbeat time

2013-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-531:
-

Labels: usability  (was: )

> RM nodes page should show time-since-last-heartbeat instead of absolute 
> last-heartbeat time
> ---
>
> Key: YARN-531
> URL: https://issues.apache.org/jira/browse/YARN-531
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>  Labels: usability
>
> Absolute last-heartbeat time is absolutely useless ;) We need to replace it 
> with time since last heartbeat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-528) Make IDs read only

2013-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620170#comment-13620170
 ] 

Vinod Kumar Vavilapalli commented on YARN-528:
--

bq. We have no plans to support any other serialization type, and the 
abstraction layer just, makes it more difficult to change protocols, makes 
changing them more error prone, and slows down the objects themselves. 
We have to make a call on this, don't think we explicitly took that decision 
yet. That said, I am inclined to throw it away but there were a couple of 
reasons why we put this (like being able to pass through unindentified fields 
for e.g. from new RM to new NM via old AM). I would like a day or two to dig 
into those with knowledgeable folks offline. Thanks for your patience.

Oh, and let's separate the tickets into MR and YARN only changes please - there 
isn't any pain as they are all orthogonal changes.

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: YARN-528.txt, YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-117) Enhance YARN service model

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620163#comment-13620163
 ] 

Hadoop QA commented on YARN-117:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576620/YARN-117.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 28 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
33 warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

  org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
  org.apache.hadoop.mapreduce.security.ssl.TestEncryptedShuffle
  org.apache.hadoop.mapred.TestNetworkedJob
  org.apache.hadoop.mapred.TestClusterMRNotification
  org.apache.hadoop.mapred.TestJobCounters
  org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner
  org.apache.hadoop.mapred.TestMiniMRClasspath
  org.apache.hadoop.mapred.TestBlockLimits
  org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers
  org.apache.hadoop.mapred.TestMiniMRChildTask
  org.apache.hadoop.mapreduce.security.TestMRCredentials
  org.apache.hadoop.mapreduce.v2.TestNonExistentJob
  org.apache.hadoop.mapreduce.v2.TestRMNMInfo
  org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
  org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
  org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
  org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
  org.apache.hadoop.mapred.TestJobCleanup
  org.apache.hadoop.mapred.TestReduceFetch
  org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
  org.apache.hadoop.mapred.TestMerge
  org.apache.hadoop.mapreduce.v2.TestMRJobs
  org.apache.hadoop.mapreduce.TestChild
  org.apache.hadoop.mapred.TestJobName
  org.apache.hadoop.mapred.TestLazyOutput
  org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
  org.apache.hadoop.mapreduce.v2.TestUberAM
  org.apache.hadoop.mapred.TestMiniMRClientCluster
  org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
  org.apache.hadoop.mapred.TestClusterMapReduceTestCase
  org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter
  org.apache.hadoop.ipc.TestSocketFactory
  org.apache.hadoop.mapred.TestJobSysDirWithDFS
  
org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher
  
org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/648//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/648//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/648//artifact/tru

[jira] [Resolved] (YARN-442) The ID classes should be immutable

2013-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-442.
--

Resolution: Duplicate
  Assignee: (was: Xuan Gong)

YARN-528 is fixing this, closing as duplicate.

> The ID classes should be immutable
> --
>
> Key: YARN-442
> URL: https://issues.apache.org/jira/browse/YARN-442
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>
> ApplicationId, ApplicationAttemptId, ContainerId should be immutable. That 
> should allow for a simpler implementation as well as remove synchronization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-528) Make IDs read only

2013-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-528:
-

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-386

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: YARN-528.txt, YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities

2013-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-382:
-

Fix Version/s: 2.0.5-beta

> SchedulerUtils improve way normalizeRequest sets the resource capabilities
> --
>
> Key: YARN-382
> URL: https://issues.apache.org/jira/browse/YARN-382
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Thomas Graves
>Assignee: Zhijie Shen
> Fix For: 2.0.5-beta
>
> Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch
>
>
> In YARN-370, we changed it from setting the capability to directly setting 
> memory and cores:
> -ask.setCapability(normalized);
> +ask.getCapability().setMemory(normalized.getMemory());
> +ask.getCapability().setVirtualCores(normalized.getVirtualCores());
> We did this because it is directly setting the values in the original 
> resource object passed in when the AM gets allocated and without it the AM 
> doesn't get the resource normalized correctly in the submission context. See 
> YARN-370 for more details.
> I think we should find a better way of doing this long term, one so we don't 
> have to keep adding things there when new resources are added, two because 
> its a bit confusing as to what its doing and prone to someone accidentally 
> breaking it in the future again.  Something closer to what Arun suggested in 
> YARN-370 would be better but we need to make sure all the places work and get 
> some more testing on it before putting it in. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620097#comment-13620097
 ] 

Hudson commented on YARN-382:
-

Integrated in Hadoop-trunk-Commit #3549 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3549/])
YARN-382. SchedulerUtils improve way normalizeRequest sets the resource 
capabilities (Zhijie Shen via bikas) (Revision 1463653)

 Result = SUCCESS
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463653
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java


> SchedulerUtils improve way normalizeRequest sets the resource capabilities
> --
>
> Key: YARN-382
> URL: https://issues.apache.org/jira/browse/YARN-382
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Thomas Graves
>Assignee: Zhijie Shen
> Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch
>
>
> In YARN-370, we changed it from setting the capability to directly setting 
> memory and cores:
> -ask.setCapability(normalized);
> +ask.getCapability().setMemory(normalized.getMemory());
> +ask.getCapability().setVirtualCores(normalized.getVirtualCores());
> We did this because it is directly setting the values in the original 
> resource object passed in when the AM gets allocated and without it the AM 
> doesn't get the resource normalized correctly in the submission context. See 
> YARN-370 for more details.
> I think we should find a better way of doing this long term, one so we don't 
> have to keep adding things there when new resources are added, two because 
> its a bit confusing as to what its doing and prone to someone accidentally 
> breaking it in the future again.  Something closer to what Arun suggested in 
> YARN-370 would be better but we need to make sure all the places work and get 
> some more testing on it before putting it in. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620096#comment-13620096
 ] 

Hadoop QA commented on YARN-530:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576617/YARN-530.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
33 warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/647//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/647//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/647//console

This message is automatically generated.

> Define Service model strictly, implement AbstractService for robust 
> subclassing, migrate yarn-common services
> -
>
> Key: YARN-530
> URL: https://issues.apache.org/jira/browse/YARN-530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117changes.pdf, YARN-530.patch
>
>
> # Extend the YARN {{Service}} interface as discussed in YARN-117
> # Implement the changes in {{AbstractService}} and {{FilterService}}.
> # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded

2013-04-02 Thread jian he (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620092#comment-13620092
 ] 

jian he commented on YARN-529:
--

several solutions:
1. Let RM accept old attempts. In current case, RM will raise exception because 
unrecognized attempts and think the job unsuccessful
2. Only clean staging dir after AM successfully unregister with RM. We can use 
a flag to indicate or modify state machine when receive JOB_AM_REBOOT, 
transition from SUCCEEDED to REBOOT. The potential problem is that, when job 
transition to SUCCEEDED state, some job success metrics stuff has already been 
triggered.

> MR app master clean staging dir when reboot command sent from RM while the MR 
> job succeeded
> ---
>
> Key: YARN-529
> URL: https://issues.apache.org/jira/browse/YARN-529
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: jian he
>Assignee: jian he
>
> MR app master will clean staging dir, if the job is already succeeded and 
> asked to reboot. RM will consider this job unsuccessful and launch further 
> attempts, further attempts will fail because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620091#comment-13620091
 ] 

Bikas Saha commented on YARN-193:
-

Also, why are there so many normalize functions and why are we creating a new 
Resource object every time we normalize? We should fix this in a different jira 
though.

> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, 
> YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, 
> YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-117) Enhance YARN service model

2013-04-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-117:


Attachment: YARN-117.patch

This is the across-all-yarn-projects patch (plus  HADOOP-9447) just to show 
what the combined patch looks and tests like. YARN-530 contains the changes to 
yarn-common which should be the first step. (This patch contains those)

> Enhance YARN service model
> --
>
> Key: YARN-117
> URL: https://issues.apache.org/jira/browse/YARN-117
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117.patch
>
>
> Having played the YARN service model, there are some issues
> that I've identified based on past work and initial use.
> This JIRA issue is an overall one to cover the issues, with solutions pushed 
> out to separate JIRAs.
> h2. state model prevents stopped state being entered if you could not 
> successfully start the service.
> In the current lifecycle you cannot stop a service unless it was successfully 
> started, but
> * {{init()}} may acquire resources that need to be explicitly released
> * if the {{start()}} operation fails partway through, the {{stop()}} 
> operation may be needed to release resources.
> *Fix:* make {{stop()}} a valid state transition from all states and require 
> the implementations to be able to stop safely without requiring all fields to 
> be non null.
> Before anyone points out that the {{stop()}} operations assume that all 
> fields are valid; and if called before a {{start()}} they will NPE; 
> MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
> for this. It is independent of the rest of the issues in this doc but it will 
> aid making {{stop()}} execute from all states other than "stopped".
> MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
> review and take up; this can be done with issues linked to this one.
> h2. AbstractService doesn't prevent duplicate state change requests.
> The {{ensureState()}} checks to verify whether or not a state transition is 
> allowed from the current state are performed in the base {{AbstractService}} 
> class -yet subclasses tend to call this *after* their own {{init()}}, 
> {{start()}} & {{stop()}} operations. This means that these operations can be 
> performed out of order, and even if the outcome of the call is an exception, 
> all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
> demonstrates this.
> This is a tricky one to address. In HADOOP-3128 I used a base class instead 
> of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods 
> {{final}}. These methods would do the checks, and then invoke protected inner 
> methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
> retrofit the same behaviour to everything that extends {{AbstractService}} 
> -something that must be done before the class is considered stable (because 
> once the lifecycle methods are declared final, all subclasses that are out of 
> the source tree will need fixing by the respective developers.
> h2. AbstractService state change doesn't defend against race conditions.
> There's no concurrency locks on the state transitions. Whatever fix for wrong 
> state calls is added should correct this to prevent re-entrancy, such as 
> {{stop()}} being called from two threads.
> h2.  Static methods to choreograph of lifecycle operations
> Helper methods to move things through lifecycles. init->start is common, 
> stop-if-service!=null another. Some static methods can execute these, and 
> even call {{stop()}} if {{init()}} raises an exception. These could go into a 
> class {{ServiceOps}} in the same package. These can be used by those services 
> that wrap other services, and help manage more robust shutdowns.
> h2. state transition failures are something that registered service listeners 
> may wish to be informed of.
> When a state transition fails a {{RuntimeException}} can be thrown -and the 
> service listeners are not informed as the notification point isn't reached. 
> They may wish to know this, especially for management and diagnostics.
> *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
> {{stateChangeFailed(Service service,Service.State targeted-state, 
> RuntimeException e)}} that is invoked from the (final) state change methods 
> in the {{AbstractService}} class (once they delegate to their inner 
> {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
> implementations of the interface.
> h2. Service listener failures not handled
> Is this an error an error or not? Log and ignore may not be what is desired.
> *Proposed:* during {{stop()}} any exception by a listener is caught and 
> discarded, t

[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620066#comment-13620066
 ] 

Bikas Saha commented on YARN-193:
-

Can we check that we are getting the expected exception and not some other one?
{code}
+try {
+  rmService.submitApplication(submitRequest);
+  Assert.fail("Application submission should fail because");
+} catch (YarnRemoteException e) {
+  // Exception is expected
+}
+  }
{code}

Setting the same config twice? In second set, why not use a -ve value instead 
of the DISABLE value? Its not clear whether we want to disable check or set a 
-ve value. same for others.
{code}
+conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 0);
+conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
+ResourceCalculator.DISABLE_RESOURCELIMIT_CHECK);
+try {
+  resourceManager.init(conf);
+  fail("Exception is expected because the min memory allocation is" +
+  " non-positive.");
+} catch (YarnException e) {
+  // Exception is expected.
{code}

Lets also add a test for case when memory is more than max. Normalize should 
always reduce that to max. Same for DRF
{code}
+// max is not a multiple of min
+maxResource = Resources.createResource(maxMemory - 10, 0);
+ask.setCapability(Resources.createResource(maxMemory - 100));
+// multiple of minMemory > maxMemory, then reduce to maxMemory
+SchedulerUtils.normalizeRequest(ask, resourceCalculator, null,
+minResource, maxResource);
+assertEquals(maxResource.getMemory(), ask.getCapability().getMemory());
   }
{code}

Rename testAppSubmitError() to show that its testing invalid resource request?

TestAMRMClient. Why is this change needed?
{code}
+amResource.setMemory(
+YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
+amContainer.setResource(amResource);
{code}

Dont we need to throw?
{code}
+  } catch (InvalidResourceRequestException e) {
+LOG.info("Resource request was not able to be alloacated for" +
+" application attempt " + appAttemptId + " because it" +
+" failed to pass the validation. " + e.getMessage());
+RPCUtil.getRemoteException(e);
+  }
{code}

typo
{code}
+// validate scheduler vcors allocation setting
{code}

This will need to be rebased after YARN-382 which I am going to commit shortly.

I am fine with requiring that a max allocation limit be set. We should also 
make sure that max allocation from config can be matched by at least 1 machine 
in the cluster. That should be a different jira.

IMO, Normalization should be called only inside the scheduler. It is an 
artifact of the scheduler logic. Nothing in the RM requires resources to be 
normalized to a multiple of min. Only the scheduler needs it to makes its life 
easier and it could choose to not do so.



> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, 
> YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, 
> YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-530:


Attachment: YARN-530.patch

This is the subset of YARN-117 for yarn-common

> Define Service model strictly, implement AbstractService for robust 
> subclassing, migrate yarn-common services
> -
>
> Key: YARN-530
> URL: https://issues.apache.org/jira/browse/YARN-530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117changes.pdf, YARN-530.patch
>
>
> # Extend the YARN {{Service}} interface as discussed in YARN-117
> # Implement the changes in {{AbstractService}} and {{FilterService}}.
> # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities

2013-04-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620057#comment-13620057
 ] 

Bikas Saha commented on YARN-382:
-

+1 looks good to me.

> SchedulerUtils improve way normalizeRequest sets the resource capabilities
> --
>
> Key: YARN-382
> URL: https://issues.apache.org/jira/browse/YARN-382
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Thomas Graves
>Assignee: Zhijie Shen
> Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch
>
>
> In YARN-370, we changed it from setting the capability to directly setting 
> memory and cores:
> -ask.setCapability(normalized);
> +ask.getCapability().setMemory(normalized.getMemory());
> +ask.getCapability().setVirtualCores(normalized.getVirtualCores());
> We did this because it is directly setting the values in the original 
> resource object passed in when the AM gets allocated and without it the AM 
> doesn't get the resource normalized correctly in the submission context. See 
> YARN-370 for more details.
> I think we should find a better way of doing this long term, one so we don't 
> have to keep adding things there when new resources are added, two because 
> its a bit confusing as to what its doing and prone to someone accidentally 
> breaking it in the future again.  Something closer to what Arun suggested in 
> YARN-370 would be better but we need to make sure all the places work and get 
> some more testing on it before putting it in. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-530:


Attachment: YARN-117changes.pdf

this is an overview of the changes, with explanations

> Define Service model strictly, implement AbstractService for robust 
> subclassing, migrate yarn-common services
> -
>
> Key: YARN-530
> URL: https://issues.apache.org/jira/browse/YARN-530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117changes.pdf
>
>
> # Extend the YARN {{Service}} interface as discussed in YARN-117
> # Implement the changes in {{AbstractService}} and {{FilterService}}.
> # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-120) Make yarn-common services robust

2013-04-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-120.
-

   Resolution: Duplicate
Fix Version/s: 3.0.0

Superceded by YARN-530

> Make yarn-common services robust
> 
>
> Key: YARN-120
> URL: https://issues.apache.org/jira/browse/YARN-120
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: yarn
> Fix For: 3.0.0
>
> Attachments: MAPREDUCE-4014.patch
>
>
> Review the yarn common services ({{CompositeService}}, 
> {{AbstractLivelinessMonitor}} and make their service startup _and especially 
> shutdown_ more robust against out-of-lifecycle invocation and partially 
> complete initialization.
> Write tests for these where possible. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-121) Yarn services to throw a YarnException on invalid state changs

2013-04-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-121.
-

   Resolution: Duplicate
Fix Version/s: 3.0.0

Superceded by YARN-530

> Yarn services to throw a YarnException on invalid state changs
> --
>
> Key: YARN-121
> URL: https://issues.apache.org/jira/browse/YARN-121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.0.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> the {{EnsureCurrentState()}} checks of services throw an 
> {{IllegalStateException}}  if the state is wrong. If this was changed to 
> {{YarnException}}. wrapper services such as CompositeService could relay this 
> direct, instead of wrapping it in their own.
> Time to implement mainly in changing the lifecycle test cases of 
> MAPREDUCE-3939 subtasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned YARN-530:
---

Assignee: Steve Loughran

> Define Service model strictly, implement AbstractService for robust 
> subclassing, migrate yarn-common services
> -
>
> Key: YARN-530
> URL: https://issues.apache.org/jira/browse/YARN-530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> # Extend the YARN {{Service}} interface as discussed in YARN-117
> # Implement the changes in {{AbstractService}} and {{FilterService}}.
> # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-02 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-530:
---

 Summary: Define Service model strictly, implement AbstractService 
for robust subclassing, migrate yarn-common services
 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran


# Extend the YARN {{Service}} interface as discussed in YARN-117
# Implement the changes in {{AbstractService}} and {{FilterService}}.
# Migrate all services in yarn-common to the more robust service model, test.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded

2013-04-02 Thread jian he (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he reassigned YARN-529:


Assignee: jian he

> MR app master clean staging dir when reboot command sent from RM while the MR 
> job succeeded
> ---
>
> Key: YARN-529
> URL: https://issues.apache.org/jira/browse/YARN-529
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: jian he
>Assignee: jian he
>
> MR app master will clean staging dir, if the job is already succeeded and 
> asked to reboot. RM will consider this job unsuccessful and launch further 
> attempts, further attempts will fail because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded

2013-04-02 Thread jian he (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he updated YARN-529:
-

Description: MR app master will clean staging dir, if the job is already 
succeeded and asked to reboot. RM will consider this job unsuccessful and 
launch further attempts, further attempts will fail because staging dir is 
cleaned

> MR app master clean staging dir when reboot command sent from RM while the MR 
> job succeeded
> ---
>
> Key: YARN-529
> URL: https://issues.apache.org/jira/browse/YARN-529
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: jian he
>
> MR app master will clean staging dir, if the job is already succeeded and 
> asked to reboot. RM will consider this job unsuccessful and launch further 
> attempts, further attempts will fail because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-122) CompositeService should clone the Configurations it passes to children

2013-04-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-122:


Priority: Minor  (was: Major)

> CompositeService should clone the Configurations it passes to children
> --
>
> Key: YARN-122
> URL: https://issues.apache.org/jira/browse/YARN-122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Steve Loughran
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> {{CompositeService.init(Configuration)}} saves the configuration passed in 
> *and* passes the same instance down to all managed services. This means a 
> change in the configuration of one child could propagate to all the others.
> Unless this is desired, the configuration should be cloned for each child.
> Fast and easy fix; tests can be added to those coming in MAPREDUCE-4014

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded

2013-04-02 Thread jian he (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he updated YARN-529:
-

Summary: MR app master clean staging dir when reboot command sent from RM 
while the MR job succeeded  (was: IF RM rebooted when MR job succeeded )

> MR app master clean staging dir when reboot command sent from RM while the MR 
> job succeeded
> ---
>
> Key: YARN-529
> URL: https://issues.apache.org/jira/browse/YARN-529
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: jian he
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-122) CompositeService should clone the Configurations it passes to children

2013-04-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620024#comment-13620024
 ] 

Steve Loughran commented on YARN-122:
-

This requires {{Configuration}} to implement {{clone()}} as a public method, so 
that any subclass of it, such as {{YarnConfiguration}} will still be passed 
down to the children

> CompositeService should clone the Configurations it passes to children
> --
>
> Key: YARN-122
> URL: https://issues.apache.org/jira/browse/YARN-122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Steve Loughran
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> {{CompositeService.init(Configuration)}} saves the configuration passed in 
> *and* passes the same instance down to all managed services. This means a 
> change in the configuration of one child could propagate to all the others.
> Unless this is desired, the configuration should be cloned for each child.
> Fast and easy fix; tests can be added to those coming in MAPREDUCE-4014

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-529) IF RM rebooted when MR job succeeded

2013-04-02 Thread jian he (JIRA)
jian he created YARN-529:


 Summary: IF RM rebooted when MR job succeeded 
 Key: YARN-529
 URL: https://issues.apache.org/jira/browse/YARN-529
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: jian he




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality

2013-04-02 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620008#comment-13620008
 ] 

Bikas Saha commented on YARN-392:
-

Yes YARN-398 but not the proposal currently in there. The alternative proposal 
is to have a new method in AM RM protocol using which the AM can blacklist 
nodes globally for all tasks (at all priorities) for that app.

> Make it possible to schedule to specific nodes without dropping locality
> 
>
> Key: YARN-392
> URL: https://issues.apache.org/jira/browse/YARN-392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Sandy Ryza
> Attachments: YARN-392-1.patch, YARN-392.patch
>
>
> Currently its not possible to specify scheduling requests for specific nodes 
> and nowhere else. The RM automatically relaxes locality to rack and * and 
> assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-528) Make IDs read only

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619989#comment-13619989
 ] 

Hadoop QA commented on YARN-528:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576592/YARN-528.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 50 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/646//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/646//console

This message is automatically generated.

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: YARN-528.txt, YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619957#comment-13619957
 ] 

Zhijie Shen commented on YARN-193:
--

{quote}
I am not sure if we should allow disabling of the max memory and max vcores 
setting. Was it supported earlier or does the patch introduce this support?
{quote}

Yes, the patch introduces the support. It is already there in your previous 
patch. I inherit it and and some description in yarn-default.xml. I'm fine with 
whether the function need to be supported or not. One risk I can image if the 
function is supported is that AM memory can exceeds 
"yarn.nodemanager.resource.memory-mb" when DISABLE_RESOURCELIMIT_CHECK is set. 
Then, the problem described in YARN-389 will occur.

{quote}
Question - should normalization of resource requests be done inside the 
scheduler or in the ApplicationMasterService itself which handles the allocate 
call?
{quote}
I think it should be better to do normalization outside allocate, because 
allocate is not only called in ApplicationMasterService and it is not necessary 
that normalize is called every time when allocate is called. For example, 
RMAppAttemptImpl#ScheduleTransition#transition doesn't require to do 
normalization because the resource has been validated during the submission 
stage. For another example, 
RMAppAttemptImpl#AMContainerAllocatedTransition#transition supplies an empty 
ask. 

{quote}
Unrelated to this patch but when throwing/logging errors related to configs, we 
should always point to the configuration property to let the user know which 
property needs to be changed. Please file a separate jira for the above.
{quote}
I'll do that, and modify the log information when exception is thrown in this 
patch.

{quote}
For InvalidResourceRequestException, missing javadocs for class description.
{quote}
I'll add the description.

{quote}
If maxMemory or maxVcores is set to -1, what will happen when normalize() is 
called?
{quote}
The normalized value has not upper bound.


> Scheduler.normalizeRequest does not account for allocation requests that 
> exceed maximumAllocation limits 
> -
>
> Key: YARN-193
> URL: https://issues.apache.org/jira/browse/YARN-193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: Hitesh Shah
>Assignee: Zhijie Shen
> Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
> MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, 
> YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, 
> YARN-193.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-528) Make IDs read only

2013-04-02 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated YARN-528:
-

Attachment: YARN-528.txt

Upmerged

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: YARN-528.txt, YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-528) Make IDs read only

2013-04-02 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619911#comment-13619911
 ] 

Robert Joseph Evans commented on YARN-528:
--

The build failed, because it needs to be upmerged, again :(

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619825#comment-13619825
 ] 

Hudson commented on YARN-475:
-

Integrated in Hadoop-Mapreduce-trunk #1389 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/])
YARN-475. Remove a unused constant in the public API - 
ApplicationConstants.AM_APP_ATTEMPT_ID_ENV. Contributed by Hitesh Shah. 
(Revision 1463033)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463033
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java


> Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
> an AM's environment
> ---
>
> Key: YARN-475
> URL: https://issues.apache.org/jira/browse/YARN-475
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Fix For: 2.0.5-beta
>
> Attachments: YARN-475.1.patch
>
>
> AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
> the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-447) applicationComparator improvement for CS

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619824#comment-13619824
 ] 

Hudson commented on YARN-447:
-

Integrated in Hadoop-Mapreduce-trunk #1389 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/])
YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator 
in ApplicationId. Contributed by Nemon Lou. (Revision 1463405)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463405
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java


> applicationComparator improvement for CS
> 
>
> Key: YARN-447
> URL: https://issues.apache.org/jira/browse/YARN-447
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: nemon lou
>Assignee: nemon lou
>Priority: Minor
> Fix For: 2.0.5-beta
>
> Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
> YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch
>
>
> Now the compare code is :
> return a1.getApplicationId().getId() - a2.getApplicationId().getId();
> Will be replaced with :
> return a1.getApplicationId().compareTo(a2.getApplicationId());
> This will bring some benefits:
> 1,leave applicationId compare logic to ApplicationId class;
> 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
> already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619821#comment-13619821
 ] 

Hudson commented on YARN-516:
-

Integrated in Hadoop-Mapreduce-trunk #1389 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/])
YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. 
Contributed by Andrew Wang. (Revision 1463362)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463362
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java


> TestContainerLocalizer.testContainerLocalizerMain is failing
> 
>
> Key: YARN-516
> URL: https://issues.apache.org/jira/browse/YARN-516
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Andrew Wang
> Fix For: 2.0.5-beta
>
> Attachments: YARN-516.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619822#comment-13619822
 ] 

Hudson commented on YARN-524:
-

Integrated in Hadoop-Mapreduce-trunk #1389 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/])
YARN-524 TestYarnVersionInfo failing if generated properties doesn't 
include an SVN URL (Revision 1463300)

 Result = SUCCESS
stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463300
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java


> TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
> --
>
> Key: YARN-524
> URL: https://issues.apache.org/jira/browse/YARN-524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 3.0.0
> Environment: OS/X with branch off github
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: YARN-524.patch
>
>
> {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call 
> returns {{Unknown}} when that is the value inserted into the property file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619817#comment-13619817
 ] 

Hudson commented on YARN-309:
-

Integrated in Hadoop-Mapreduce-trunk #1389 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/])
YARN-309. Changed NodeManager to obtain heart-beat interval from the 
ResourceManager. Contributed by Xuan Gong. (Revision 1463346)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463346
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


> Make RM provide heartbeat interval to NM
> 
>
> Key: YARN-309
> URL: https://issues.apache.org/jira/browse/YARN-309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.0.5-beta
>
> Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, 
> YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, 
> YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-527) Local filecache mkdir fails

2013-04-02 Thread Knut O. Hellan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619797#comment-13619797
 ] 

Knut O. Hellan commented on YARN-527:
-

Digging through the code, it looks to me like the native Java File.mkdirs is 
used to actually create the directory and it will not give information about 
why it failed. If that is the case then I guess this issue is actually a 
feature request that yarn should be better at cleaning up old file caches so 
that this situation will not happen.

> Local filecache mkdir fails
> ---
>
> Key: YARN-527
> URL: https://issues.apache.org/jira/browse/YARN-527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.0-alpha
> Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes 
> and six worker nodes.
>Reporter: Knut O. Hellan
>Priority: Minor
> Attachments: yarn-site.xml
>
>
> Jobs failed with no other explanation than this stack trace:
> 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag
> nostics report from attempt_1364591875320_0017_m_00_0: 
> java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893
> 55400878397 failed
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
> at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Manually creating the directory worked. This behavior was common to at least 
> several nodes in the cluster.
> The situation was resolved by removing and recreating all 
> /disk?/yarn/local/filecache directories on all nodes.
> It is unclear whether Yarn struggled with the number of files or if there 
> were corrupt files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality

2013-04-02 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619778#comment-13619778
 ] 

Thomas Graves commented on YARN-392:


Bikas when you say creating an API for blacklisting a set of nodes are you 
basically referring to YARN-398 or something else?

> Make it possible to schedule to specific nodes without dropping locality
> 
>
> Key: YARN-392
> URL: https://issues.apache.org/jira/browse/YARN-392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Sandy Ryza
> Attachments: YARN-392-1.patch, YARN-392.patch
>
>
> Currently its not possible to specify scheduling requests for specific nodes 
> and nowhere else. The RM automatically relaxes locality to rack and * and 
> assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-447) applicationComparator improvement for CS

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619768#comment-13619768
 ] 

Hudson commented on YARN-447:
-

Integrated in Hadoop-Hdfs-trunk #1362 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/])
YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator 
in ApplicationId. Contributed by Nemon Lou. (Revision 1463405)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463405
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java


> applicationComparator improvement for CS
> 
>
> Key: YARN-447
> URL: https://issues.apache.org/jira/browse/YARN-447
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: nemon lou
>Assignee: nemon lou
>Priority: Minor
> Fix For: 2.0.5-beta
>
> Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
> YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch
>
>
> Now the compare code is :
> return a1.getApplicationId().getId() - a2.getApplicationId().getId();
> Will be replaced with :
> return a1.getApplicationId().compareTo(a2.getApplicationId());
> This will bring some benefits:
> 1,leave applicationId compare logic to ApplicationId class;
> 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
> already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619766#comment-13619766
 ] 

Hudson commented on YARN-524:
-

Integrated in Hadoop-Hdfs-trunk #1362 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/])
YARN-524 TestYarnVersionInfo failing if generated properties doesn't 
include an SVN URL (Revision 1463300)

 Result = FAILURE
stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463300
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java


> TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
> --
>
> Key: YARN-524
> URL: https://issues.apache.org/jira/browse/YARN-524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 3.0.0
> Environment: OS/X with branch off github
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: YARN-524.patch
>
>
> {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call 
> returns {{Unknown}} when that is the value inserted into the property file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619765#comment-13619765
 ] 

Hudson commented on YARN-516:
-

Integrated in Hadoop-Hdfs-trunk #1362 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/])
YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. 
Contributed by Andrew Wang. (Revision 1463362)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463362
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java


> TestContainerLocalizer.testContainerLocalizerMain is failing
> 
>
> Key: YARN-516
> URL: https://issues.apache.org/jira/browse/YARN-516
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Andrew Wang
> Fix For: 2.0.5-beta
>
> Attachments: YARN-516.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619761#comment-13619761
 ] 

Hudson commented on YARN-309:
-

Integrated in Hadoop-Hdfs-trunk #1362 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/])
YARN-309. Changed NodeManager to obtain heart-beat interval from the 
ResourceManager. Contributed by Xuan Gong. (Revision 1463346)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463346
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


> Make RM provide heartbeat interval to NM
> 
>
> Key: YARN-309
> URL: https://issues.apache.org/jira/browse/YARN-309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.0.5-beta
>
> Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, 
> YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, 
> YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-528) Make IDs read only

2013-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619750#comment-13619750
 ] 

Hadoop QA commented on YARN-528:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576553/YARN-528.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 49 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/645//console

This message is automatically generated.

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-528) Make IDs read only

2013-04-02 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reassigned YARN-528:


Assignee: Robert Joseph Evans

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-528) Make IDs read only

2013-04-02 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated YARN-528:
-

Attachment: YARN-528.txt

This patch contains changes to both Map/Reduce IDs as well as YARN APIs.  I 
don't really want to split them up right now, but I am happy to file a separate 
JIRA for tracking purposes if the community decides this is a direction we want 
to go in.

> Make IDs read only
> --
>
> Key: YARN-528
> URL: https://issues.apache.org/jira/browse/YARN-528
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Robert Joseph Evans
> Attachments: YARN-528.txt
>
>
> I really would like to rip out most if not all of the abstraction layer that 
> sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
> no plans to support any other serialization type, and the abstraction layer 
> just, makes it more difficult to change protocols, makes changing them more 
> error prone, and slows down the objects themselves.  
> Completely doing that is a lot of work.  This JIRA is a first step towards 
> that.  It makes the various ID objects immutable.  If this patch is wel 
> received I will try to go through other objects/classes of objects and update 
> them in a similar way.
> This is probably the last time we will be able to make a change like this 
> before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-528) Make IDs read only

2013-04-02 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created YARN-528:


 Summary: Make IDs read only
 Key: YARN-528
 URL: https://issues.apache.org/jira/browse/YARN-528
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Robert Joseph Evans


I really would like to rip out most if not all of the abstraction layer that 
sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
no plans to support any other serialization type, and the abstraction layer 
just, makes it more difficult to change protocols, makes changing them more 
error prone, and slows down the objects themselves.  

Completely doing that is a lot of work.  This JIRA is a first step towards 
that.  It makes the various ID objects immutable.  If this patch is wel 
received I will try to go through other objects/classes of objects and update 
them in a similar way.

This is probably the last time we will be able to make a change like this 
before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-525) make CS node-locality-delay refreshable

2013-04-02 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-525:
---

Issue Type: Improvement  (was: Bug)
   Summary: make CS node-locality-delay refreshable  (was: 
yarn.scheduler.capacity.node-locality-delay doesn't change with rmadmin 
-refreshQueues)

> make CS node-locality-delay refreshable
> ---
>
> Key: YARN-525
> URL: https://issues.apache.org/jira/browse/YARN-525
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.0.3-alpha, 0.23.7
>Reporter: Thomas Graves
>
> the config yarn.scheduler.capacity.node-locality-delay doesn't change when 
> you change the value in capacity_scheduler.xml and then run yarn rmadmin 
> -refreshQueues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-447) applicationComparator improvement for CS

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619697#comment-13619697
 ] 

Hudson commented on YARN-447:
-

Integrated in Hadoop-Yarn-trunk #173 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/173/])
YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator 
in ApplicationId. Contributed by Nemon Lou. (Revision 1463405)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463405
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java


> applicationComparator improvement for CS
> 
>
> Key: YARN-447
> URL: https://issues.apache.org/jira/browse/YARN-447
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: nemon lou
>Assignee: nemon lou
>Priority: Minor
> Fix For: 2.0.5-beta
>
> Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
> YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch
>
>
> Now the compare code is :
> return a1.getApplicationId().getId() - a2.getApplicationId().getId();
> Will be replaced with :
> return a1.getApplicationId().compareTo(a2.getApplicationId());
> This will bring some benefits:
> 1,leave applicationId compare logic to ApplicationId class;
> 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
> already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619695#comment-13619695
 ] 

Hudson commented on YARN-524:
-

Integrated in Hadoop-Yarn-trunk #173 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/173/])
YARN-524 TestYarnVersionInfo failing if generated properties doesn't 
include an SVN URL (Revision 1463300)

 Result = SUCCESS
stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463300
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java


> TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
> --
>
> Key: YARN-524
> URL: https://issues.apache.org/jira/browse/YARN-524
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 3.0.0
> Environment: OS/X with branch off github
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: YARN-524.patch
>
>
> {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call 
> returns {{Unknown}} when that is the value inserted into the property file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619694#comment-13619694
 ] 

Hudson commented on YARN-516:
-

Integrated in Hadoop-Yarn-trunk #173 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/173/])
YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. 
Contributed by Andrew Wang. (Revision 1463362)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463362
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java


> TestContainerLocalizer.testContainerLocalizerMain is failing
> 
>
> Key: YARN-516
> URL: https://issues.apache.org/jira/browse/YARN-516
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Andrew Wang
> Fix For: 2.0.5-beta
>
> Attachments: YARN-516.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM

2013-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619690#comment-13619690
 ] 

Hudson commented on YARN-309:
-

Integrated in Hadoop-Yarn-trunk #173 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/173/])
YARN-309. Changed NodeManager to obtain heart-beat interval from the 
ResourceManager. Contributed by Xuan Gong. (Revision 1463346)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463346
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


> Make RM provide heartbeat interval to NM
> 
>
> Key: YARN-309
> URL: https://issues.apache.org/jira/browse/YARN-309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.0.5-beta
>
> Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, 
> YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, 
> YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-527) Local filecache mkdir fails

2013-04-02 Thread Knut O. Hellan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Knut O. Hellan updated YARN-527:


Attachment: yarn-site.xml

> Local filecache mkdir fails
> ---
>
> Key: YARN-527
> URL: https://issues.apache.org/jira/browse/YARN-527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.0-alpha
> Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes 
> and six worker nodes.
>Reporter: Knut O. Hellan
>Priority: Minor
> Attachments: yarn-site.xml
>
>
> Jobs failed with no other explanation than this stack trace:
> 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag
> nostics report from attempt_1364591875320_0017_m_00_0: 
> java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893
> 55400878397 failed
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
> at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Manually creating the directory worked. This behavior was common to at least 
> several nodes in the cluster.
> The situation was resolved by removing and recreating all 
> /disk?/yarn/local/filecache directories on all nodes.
> It is unclear whether Yarn struggled with the number of files or if there 
> were corrupt files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-527) Local filecache mkdir fails

2013-04-02 Thread Knut O. Hellan (JIRA)
Knut O. Hellan created YARN-527:
---

 Summary: Local filecache mkdir fails
 Key: YARN-527
 URL: https://issues.apache.org/jira/browse/YARN-527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
 Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and 
six worker nodes.
Reporter: Knut O. Hellan
Priority: Minor


Jobs failed with no other explanation than this stack trace:

2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag
nostics report from attempt_1364591875320_0017_m_00_0: java.io.IOException: 
mkdir of /disk3/yarn/local/filecache/-42307893
55400878397 failed
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
at 
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
at 
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Manually creating the directory worked. This behavior was common to at least 
several nodes in the cluster.

The situation was resolved by removing and recreating all 
/disk?/yarn/local/filecache directories on all nodes.

It is unclear whether Yarn struggled with the number of files or if there were 
corrupt files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >