[jira] [Updated] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
[ https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-457: - Attachment: YARN-457-2.patch Sorry. I changed to call initLocalNewNodeReportList() before clearing this.updatedNodes. > Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl > > > Key: YARN-457 > URL: https://issues.apache.org/jira/browse/YARN-457 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Kenji Kikushima >Priority: Minor > Labels: Newbie > Attachments: YARN-457-2.patch, YARN-457.patch > > > {code} > if (updatedNodes == null) { > this.updatedNodes.clear(); > return; > } > {code} > If updatedNodes is already null, a NullPointerException is thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620651#comment-13620651 ] Zhijie Shen commented on YARN-193: -- {quote} Default value of max-vcores of 32 might be too high. {quote} Why 32 is originally used? In http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/, it is said: 2012 – 16+ cores, 48-96GB of RAM, 12x2TB or 12x3TB of disk. How about we choosing 16? {quote} Why is conf being set 2 times for each value? Same for vcores. {quote} I'll fix the bug. > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, > YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, > YARN-193.8.patch, YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620638#comment-13620638 ] Bikas Saha commented on YARN-193: - Default value of max-vcores of 32 might be too high. Why is conf being set 2 times for each value? Same for vcores. {code} +conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 2048); +conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 1024); +try { + resourceManager.init(conf); + fail("Exception is expected because the min memory allocation is" + + " larger than the max memory allocation."); +} catch (YarnException e) { + // Exception is expected. +} {code} > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, > YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, > YARN-193.8.patch, YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620617#comment-13620617 ] Hudson commented on YARN-467: - Integrated in Hadoop-trunk-Commit #3552 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3552/]) YARN-467. Modify public distributed cache to localize files such that no local directory hits unix file count limits and thus prevent job failures. Contributed by Omkar Vinit Joshi. (Revision 1463823) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463823 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceRetention.java > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.0.5-beta > > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620610#comment-13620610 ] Hadoop QA commented on YARN-101: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576714/YARN-101.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/658//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/658//console This message is automatically generated. > If the heartbeat message loss, the nodestatus info of complete container > will loss too. > > > Key: YARN-101 > URL: https://issues.apache.org/jira/browse/YARN-101 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: suse. >Reporter: xieguiming >Assignee: Xuan Gong >Priority: Minor > Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, > YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch > > > see the red color: > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java > protected void startStatusUpdater() { > new Thread("Node Status Updater") { > @Override > @SuppressWarnings("unchecked") > public void run() { > int lastHeartBeatID = 0; > while (!isStopped) { > // Send heartbeat > try { > synchronized (heartbeatMonitor) { > heartbeatMonitor.wait(heartBeatInterval); > } > {color:red} > // Before we send the heartbeat, we get the NodeStatus, > // whose method removes completed containers. > NodeStatus nodeStatus = getNodeStatus(); > {color} > nodeStatus.setResponseId(lastHeartBeatID); > > NodeHeartbeatRequest request = recordFactory > .newRecordInstance(NodeHeartbeatRequest.class); > request.setNodeStatus(nodeStatus); > {color:red} >// But if the nodeHeartbeat fails, we've already removed the > containers away to know about it. We aren't handling a nodeHeartbeat failure > case here. > HeartbeatResponse response = > resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); >{color} > if (response.getNodeAction() == NodeAction.SHUTDOWN) { > LOG > .info("Recieved SHUTDOWN signal from Resourcemanager as > part of heartbeat," + > " hence shutting down."); > NodeStatusUpdaterImpl.this.stop(); > break; > } > if (response.getNodeAction() == NodeAction.REBOOT) { > LOG.info("Node is out of sync with ResourceManager," > + " hence rebooting."); > NodeStatusUpdaterImpl.this.reboot(); > break; > } > lastHeartBeatID = response.getResponseId(); > List containersToCleanup = response > .getContainersToCleanupList(); > if (containersToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedContainersEvent(containersToCleanup)); > } > List appsToCleanup = > response.getApplicationsToCleanupList(); > //Only start tracking for keepAlive on FINISH_APP > trackAppsForKeepAlive(appsToCleanup); > if (appsToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedAppsEvent(appsToCleanup)); > } > } catch (Throwable e) { > // TODO Better erro
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620608#comment-13620608 ] Vinod Kumar Vavilapalli commented on YARN-467: -- Perfect, the latest patch looks good. Checking it in. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-101: --- Attachment: YARN-101.6.patch recreate test case to verify status of all containers in every heartbeat > If the heartbeat message loss, the nodestatus info of complete container > will loss too. > > > Key: YARN-101 > URL: https://issues.apache.org/jira/browse/YARN-101 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: suse. >Reporter: xieguiming >Assignee: Xuan Gong >Priority: Minor > Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, > YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch > > > see the red color: > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java > protected void startStatusUpdater() { > new Thread("Node Status Updater") { > @Override > @SuppressWarnings("unchecked") > public void run() { > int lastHeartBeatID = 0; > while (!isStopped) { > // Send heartbeat > try { > synchronized (heartbeatMonitor) { > heartbeatMonitor.wait(heartBeatInterval); > } > {color:red} > // Before we send the heartbeat, we get the NodeStatus, > // whose method removes completed containers. > NodeStatus nodeStatus = getNodeStatus(); > {color} > nodeStatus.setResponseId(lastHeartBeatID); > > NodeHeartbeatRequest request = recordFactory > .newRecordInstance(NodeHeartbeatRequest.class); > request.setNodeStatus(nodeStatus); > {color:red} >// But if the nodeHeartbeat fails, we've already removed the > containers away to know about it. We aren't handling a nodeHeartbeat failure > case here. > HeartbeatResponse response = > resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); >{color} > if (response.getNodeAction() == NodeAction.SHUTDOWN) { > LOG > .info("Recieved SHUTDOWN signal from Resourcemanager as > part of heartbeat," + > " hence shutting down."); > NodeStatusUpdaterImpl.this.stop(); > break; > } > if (response.getNodeAction() == NodeAction.REBOOT) { > LOG.info("Node is out of sync with ResourceManager," > + " hence rebooting."); > NodeStatusUpdaterImpl.this.reboot(); > break; > } > lastHeartBeatID = response.getResponseId(); > List containersToCleanup = response > .getContainersToCleanupList(); > if (containersToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedContainersEvent(containersToCleanup)); > } > List appsToCleanup = > response.getApplicationsToCleanupList(); > //Only start tracking for keepAlive on FINISH_APP > trackAppsForKeepAlive(appsToCleanup); > if (appsToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedAppsEvent(appsToCleanup)); > } > } catch (Throwable e) { > // TODO Better error handling. Thread can die with the rest of the > // NM still running. > LOG.error("Caught exception in status-updater", e); > } > } > } > }.start(); > } > private NodeStatus getNodeStatus() { > NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); > nodeStatus.setNodeId(this.nodeId); > int numActiveContainers = 0; > List containersStatuses = new > ArrayList(); > for (Iterator> i = > this.context.getContainers().entrySet().iterator(); i.hasNext();) { > Entry e = i.next(); > ContainerId containerId = e.getKey(); > Container container = e.getValue(); > // Clone the container to send it to the RM > org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = > container.cloneAndGetContainerStatus(); > containersStatuses.add(containerStatus); > ++numActiveContainers; > LOG.info("Sending out status for container: " + containerStatus); > {color:red} > // Here is the part that removes the completed containers. > if (containerStatus.getState() == ContainerState.COMPLETE) { > // Remove > i.remove(); > {color} > LOG.info("Removed completed container " + containerId); > } >
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620546#comment-13620546 ] Hadoop QA commented on YARN-467: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576705/yarn-467-20130402.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/657//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/657//console This message is automatically generated. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-467: --- Attachment: yarn-467-20130402.2.patch > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620535#comment-13620535 ] Omkar Vinit Joshi commented on YARN-467: I have tested this code for below scenarios * I used 4 local-dirs to see if the localization gets distributed across them and LocalCacheDirectoryManager is managing them separately * I tested for various values of "yarn.nodemanager.local-cache.max-files-per-directory" <=36, 37 , 40 and much larger.. * I modified the cache cleanup interval and cache target size in mb to see older files getting removed from cache and LocalCacheDirectoryManager's sub directories are getting reused. * I tested that we never run into a situation where we have more number of files or sub directories in any local-directory than what is specified in the configuration. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620501#comment-13620501 ] Hadoop QA commented on YARN-458: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576699/YARN-458.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/656//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/656//console This message is automatically generated. > YARN daemon addresses must be placed in many different configs > -- > > Key: YARN-458 > URL: https://issues.apache.org/jira/browse/YARN-458 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-458.patch > > > The YARN resourcemanager's address is included in four different configs: > yarn.resourcemanager.scheduler.address, > yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, > and yarn.resourcemanager.admin.address > A new user trying to configure a cluster needs to know the names of all these > four configs. > The same issue exists for nodemanagers. > It would be much easier if they could simply specify > yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports > for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-458: Component/s: resourcemanager nodemanager > YARN daemon addresses must be placed in many different configs > -- > > Key: YARN-458 > URL: https://issues.apache.org/jira/browse/YARN-458 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-458.patch > > > The YARN resourcemanager's address is included in four different configs: > yarn.resourcemanager.scheduler.address, > yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, > and yarn.resourcemanager.admin.address > A new user trying to configure a cluster needs to know the names of all these > four configs. > The same issue exists for nodemanagers. > It would be much easier if they could simply specify > yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports > for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-458: Affects Version/s: 2.0.3-alpha > YARN daemon addresses must be placed in many different configs > -- > > Key: YARN-458 > URL: https://issues.apache.org/jira/browse/YARN-458 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-458.patch > > > The YARN resourcemanager's address is included in four different configs: > yarn.resourcemanager.scheduler.address, > yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, > and yarn.resourcemanager.admin.address > A new user trying to configure a cluster needs to know the names of all these > four configs. > The same issue exists for nodemanagers. > It would be much easier if they could simply specify > yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports > for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-458: Description: The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. The same issue exists for nodemanagers. It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in. was: The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. It would be much easier if they could simply specify yarn.resourcemanager.address and default ports for the other ones would kick in. > YARN daemon addresses must be placed in many different configs > -- > > Key: YARN-458 > URL: https://issues.apache.org/jira/browse/YARN-458 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-458.patch > > > The YARN resourcemanager's address is included in four different configs: > yarn.resourcemanager.scheduler.address, > yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, > and yarn.resourcemanager.admin.address > A new user trying to configure a cluster needs to know the names of all these > four configs. > The same issue exists for nodemanagers. > It would be much easier if they could simply specify > yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports > for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-458) Resource manager address must be placed in four different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620489#comment-13620489 ] Sandy Ryza commented on YARN-458: - Uploaded a patch that adds yarn.resourcemanager.hostname and yarn.nodemanager.hostname properties, and changes all the other configs to use ${yarn.resourcemanager.address} and ${yarn.nodemanager.address). > Resource manager address must be placed in four different configs > - > > Key: YARN-458 > URL: https://issues.apache.org/jira/browse/YARN-458 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-458.patch > > > The YARN resourcemanager's address is included in four different configs: > yarn.resourcemanager.scheduler.address, > yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, > and yarn.resourcemanager.admin.address > A new user trying to configure a cluster needs to know the names of all these > four configs. > It would be much easier if they could simply specify > yarn.resourcemanager.address and default ports for the other ones would kick > in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-458: Summary: YARN daemon addresses must be placed in many different configs (was: Resource manager address must be placed in four different configs) > YARN daemon addresses must be placed in many different configs > -- > > Key: YARN-458 > URL: https://issues.apache.org/jira/browse/YARN-458 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-458.patch > > > The YARN resourcemanager's address is included in four different configs: > yarn.resourcemanager.scheduler.address, > yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, > and yarn.resourcemanager.admin.address > A new user trying to configure a cluster needs to know the names of all these > four configs. > It would be much easier if they could simply specify > yarn.resourcemanager.address and default ports for the other ones would kick > in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-458) Resource manager address must be placed in four different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-458: Attachment: YARN-458.patch > Resource manager address must be placed in four different configs > - > > Key: YARN-458 > URL: https://issues.apache.org/jira/browse/YARN-458 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-458.patch > > > The YARN resourcemanager's address is included in four different configs: > yarn.resourcemanager.scheduler.address, > yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, > and yarn.resourcemanager.admin.address > A new user trying to configure a cluster needs to know the names of all these > four configs. > It would be much easier if they could simply specify > yarn.resourcemanager.address and default ports for the other ones would kick > in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-534) AM max attempts is not checked when RM restart and try to recover attempts
Jian He created YARN-534: Summary: AM max attempts is not checked when RM restart and try to recover attempts Key: YARN-534 URL: https://issues.apache.org/jira/browse/YARN-534 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Currently,AM max attempts is only checked if the current attempt fails and check to see whether to create new attempt. If the RM restarts before the max-attempt fails, it'll not clean the state store, when RM comes back, it will retry attempt again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-495) Containers are not terminated when the NM is rebooted
[ https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620448#comment-13620448 ] Hadoop QA commented on YARN-495: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576695/YARN-495.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/655//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/655//console This message is automatically generated. > Containers are not terminated when the NM is rebooted > - > > Key: YARN-495 > URL: https://issues.apache.org/jira/browse/YARN-495 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-495.1.patch, YARN-495.2.patch > > > When a reboot command is sent from RM, the node manager doesn't clean up the > containers while its stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-495) Containers are not terminated when the NM is rebooted
[ https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620443#comment-13620443 ] Jian He commented on YARN-495: -- Uploaded a patch, change NM behavior from REBOOT to RESYNC when the RM restarted > Containers are not terminated when the NM is rebooted > - > > Key: YARN-495 > URL: https://issues.apache.org/jira/browse/YARN-495 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-495.1.patch, YARN-495.2.patch > > > When a reboot command is sent from RM, the node manager doesn't clean up the > containers while its stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-495) Containers are not terminated when the NM is rebooted
[ https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-495: - Attachment: YARN-495.2.patch > Containers are not terminated when the NM is rebooted > - > > Key: YARN-495 > URL: https://issues.apache.org/jira/browse/YARN-495 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-495.1.patch, YARN-495.2.patch > > > When a reboot command is sent from RM, the node manager doesn't clean up the > containers while its stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-533) Pointing to the config property when throwing/logging the config-related exception
Zhijie Shen created YARN-533: Summary: Pointing to the config property when throwing/logging the config-related exception Key: YARN-533 URL: https://issues.apache.org/jira/browse/YARN-533 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen When throwing/logging errors related to configiguration, we should always point to the configuration property to let users know which property needs to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620412#comment-13620412 ] Hadoop QA commented on YARN-467: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576688/yarn-467-20130402.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/654//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/654//console This message is automatically generated. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-467: --- Attachment: yarn-467-20130402.1.patch fixing test issue... that check is no longer valid. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620385#comment-13620385 ] Hadoop QA commented on YARN-193: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576680/YARN-193.12.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/653//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/653//console This message is automatically generated. > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, > YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, > YARN-193.8.patch, YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620384#comment-13620384 ] Hadoop QA commented on YARN-467: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576681/yarn-467-20130402.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/652//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/652//console This message is automatically generated. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-467: --- Attachment: yarn-467-20130402.patch Fixing below issues 1) all the formatting issues 2) adding one additional test case for checking Directory state transition from FULL->NON_FULL->FULL 3) javadoc warnings > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-193: - Attachment: YARN-193.12.patch 1. Remove the DISABLE_RESOURCELIMIT_CHECK feature, and its related test cases. 2. Rewrite the log messages, and output them through LOG.warn. 3. Add javadocs for InvalidResourceRequestException. 4. Check whether thrown exception is InvalidResourceRequestException in TestClientRMService. 5. Add the test case of ask > max in TestSchedulerUtils. 6. Fixed other minor issues commented by Bikas and Hitesh (e.g., typo, unnecessary import). 7. Rebase with YARN-382. > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, > YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, > YARN-193.8.patch, YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable
[ https://issues.apache.org/jira/browse/YARN-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620362#comment-13620362 ] Hadoop QA commented on YARN-532: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576674/YARN-532.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/651//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/651//console This message is automatically generated. > RMAdminProtocolPBClientImpl should implement Closeable > -- > > Key: YARN-532 > URL: https://issues.apache.org/jira/browse/YARN-532 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: YARN-532.txt > > > Required for RPC.stopProxy to work. Already done in most of the other > protocols. (MAPREDUCE-5117 addressing the one other protocol missing this) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620348#comment-13620348 ] Siddharth Seth commented on YARN-528: - bq. I really don't understand how this is supposed to work. How do we create fewer objects by wrapping them in more objects? I can see us doing something like deduping the objects that come over the wire, but I don't see how wrapping works here. Not compared to using Protos directly (which wasn't really an option), but compared to an alternate of converting only for the RPC layer. > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: YARN-528.txt, YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable
[ https://issues.apache.org/jira/browse/YARN-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-532: Attachment: YARN-532.txt Trivial fix. > RMAdminProtocolPBClientImpl should implement Closeable > -- > > Key: YARN-532 > URL: https://issues.apache.org/jira/browse/YARN-532 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: YARN-532.txt > > > Required for RPC.stopProxy to work. Already done in most of the other > protocols. (MAPREDUCE-5117 addressing the one other protocol missing this) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable
Siddharth Seth created YARN-532: --- Summary: RMAdminProtocolPBClientImpl should implement Closeable Key: YARN-532 URL: https://issues.apache.org/jira/browse/YARN-532 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth Required for RPC.stopProxy to work. Already done in most of the other protocols. (MAPREDUCE-5117 addressing the one other protocol missing this) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620326#comment-13620326 ] Robert Joseph Evans commented on YARN-528: -- I am fine with splitting the MR changes from the YARN change like I said, I put this out here more to be a question of how do we want to go about implementing theses changes, and the test was more of a prototype example. I personally lean more towards using the *Proto classes directly. Why have something else wrapping it if we don't need it, even if it is a small and simple layer. The only reason I did not go that route here is because of toString(). With the IDs we rely on having ID.toString() turn into something very specific that can be parsed and turned back into an instance of the object. If I had the time I would trace down all places where we call toString on them and replace it with something else. I may just scale back the scope of the patch to look at ApplicationID to begin with and try to see if I can accomplish this. bq. Wrapping the object which came over the wire - with a goal of creating fewer objects. I really don't understand how this is supposed to work. How do we create fewer objects by wrapping them in more objects? I can see us doing something like deduping the objects that come over the wire, but I don't see how wrapping works here. > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: YARN-528.txt, YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620291#comment-13620291 ] Hadoop QA commented on YARN-479: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576654/YARN-479.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/650//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/650//console This message is automatically generated. > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620284#comment-13620284 ] Hadoop QA commented on YARN-101: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576650/YARN-101.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/649//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/649//console This message is automatically generated. > If the heartbeat message loss, the nodestatus info of complete container > will loss too. > > > Key: YARN-101 > URL: https://issues.apache.org/jira/browse/YARN-101 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: suse. >Reporter: xieguiming >Assignee: Xuan Gong >Priority: Minor > Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, > YARN-101.4.patch, YARN-101.5.patch > > > see the red color: > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java > protected void startStatusUpdater() { > new Thread("Node Status Updater") { > @Override > @SuppressWarnings("unchecked") > public void run() { > int lastHeartBeatID = 0; > while (!isStopped) { > // Send heartbeat > try { > synchronized (heartbeatMonitor) { > heartbeatMonitor.wait(heartBeatInterval); > } > {color:red} > // Before we send the heartbeat, we get the NodeStatus, > // whose method removes completed containers. > NodeStatus nodeStatus = getNodeStatus(); > {color} > nodeStatus.setResponseId(lastHeartBeatID); > > NodeHeartbeatRequest request = recordFactory > .newRecordInstance(NodeHeartbeatRequest.class); > request.setNodeStatus(nodeStatus); > {color:red} >// But if the nodeHeartbeat fails, we've already removed the > containers away to know about it. We aren't handling a nodeHeartbeat failure > case here. > HeartbeatResponse response = > resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); >{color} > if (response.getNodeAction() == NodeAction.SHUTDOWN) { > LOG > .info("Recieved SHUTDOWN signal from Resourcemanager as > part of heartbeat," + > " hence shutting down."); > NodeStatusUpdaterImpl.this.stop(); > break; > } > if (response.getNodeAction() == NodeAction.REBOOT) { > LOG.info("Node is out of sync with ResourceManager," > + " hence rebooting."); > NodeStatusUpdaterImpl.this.reboot(); > break; > } > lastHeartBeatID = response.getResponseId(); > List containersToCleanup = response > .getContainersToCleanupList(); > if (containersToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedContainersEvent(containersToCleanup)); > } > List appsToCleanup = > response.getApplicationsToCleanupList(); > //Only start tracking for keepAlive on FINISH_APP > trackAppsForKeepAlive(appsToCleanup); > if (appsToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedAppsEvent(appsToCleanup)); > } > } catch (Throwable e) { > // TODO Better error handling. Thread
[jira] [Updated] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-479: - Attachment: YARN-479.5.patch > NM retry behavior for connection to RM should be similar for lost heartbeats > > > Key: YARN-479 > URL: https://issues.apache.org/jira/browse/YARN-479 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jian He > Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, > YARN-479.4.patch, YARN-479.5.patch > > > Regardless of connection loss at the start or at an intermediate point, NM's > retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620275#comment-13620275 ] Siddharth Seth commented on YARN-528: - Yep, we'll likely only support a single serialization, which at this point is PB. What the current approach was supposed to be good at. 1. Handling unknown fields (which proto already supports), which could make rolling upgrades etc easier. 2. Wrapping the object which came over the wire - with a goal of creating fewer objects. I don't think the second point was really achieved, with the implementation getting complicated because of the interfaces being mutable, lists and supporting chained sets (clc.getResource().setMemory()). I think point one should continue to be maintained. Do we want *Proto references in the APIs (client library versus Java Protocol definition) . At the moment, these are only referenced in the PBImpls - and hidden by the abstraction layer. What I don't like about the patch is Protos leaking into the object constructors. Instead, I think we could just use simple Java objects, with conversion at the RPC layer (I believe this is similar to the HDFS model). Unknown fields can be handled via byte[] arrays. I'm guessing very few of the interfaces actually need to be mutable - so in that sense, yes this needs to be done before beta. OTOH, changing the PBImpl itself can be done at a later point if required. (Earlier is of-course better, and I'd be happy to help with this. Was planning on working on YARN-442 before you started this work). > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: YARN-528.txt, YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land
[ https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-486: -- Assignee: Xuan Gong (was: Bikas Saha) > Change startContainer NM API to accept Container as a parameter and make > ContainerLaunchContext user land > - > > Key: YARN-486 > URL: https://issues.apache.org/jira/browse/YARN-486 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Xuan Gong > > Currently, id, resource request etc need to be copied over from Container to > ContainerLaunchContext. This can be brittle. Also it leads to duplication of > information (such as Resource from CLC and Resource from Container and > Container.tokens). Sending Container directly to startContainer solves these > problems. It also makes CLC clean by only having stuff in it that it set by > the client/AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620261#comment-13620261 ] Xuan Gong commented on YARN-101: 1.Use YarnServerBuilderUtils for constructing node-heartbeat response 2.User BuilderUtils to create ApplicationId, ContainerId, ContainerStatus, etc 3.Recreated the test case as last comment suggested > If the heartbeat message loss, the nodestatus info of complete container > will loss too. > > > Key: YARN-101 > URL: https://issues.apache.org/jira/browse/YARN-101 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: suse. >Reporter: xieguiming >Assignee: Xuan Gong >Priority: Minor > Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, > YARN-101.4.patch, YARN-101.5.patch > > > see the red color: > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java > protected void startStatusUpdater() { > new Thread("Node Status Updater") { > @Override > @SuppressWarnings("unchecked") > public void run() { > int lastHeartBeatID = 0; > while (!isStopped) { > // Send heartbeat > try { > synchronized (heartbeatMonitor) { > heartbeatMonitor.wait(heartBeatInterval); > } > {color:red} > // Before we send the heartbeat, we get the NodeStatus, > // whose method removes completed containers. > NodeStatus nodeStatus = getNodeStatus(); > {color} > nodeStatus.setResponseId(lastHeartBeatID); > > NodeHeartbeatRequest request = recordFactory > .newRecordInstance(NodeHeartbeatRequest.class); > request.setNodeStatus(nodeStatus); > {color:red} >// But if the nodeHeartbeat fails, we've already removed the > containers away to know about it. We aren't handling a nodeHeartbeat failure > case here. > HeartbeatResponse response = > resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); >{color} > if (response.getNodeAction() == NodeAction.SHUTDOWN) { > LOG > .info("Recieved SHUTDOWN signal from Resourcemanager as > part of heartbeat," + > " hence shutting down."); > NodeStatusUpdaterImpl.this.stop(); > break; > } > if (response.getNodeAction() == NodeAction.REBOOT) { > LOG.info("Node is out of sync with ResourceManager," > + " hence rebooting."); > NodeStatusUpdaterImpl.this.reboot(); > break; > } > lastHeartBeatID = response.getResponseId(); > List containersToCleanup = response > .getContainersToCleanupList(); > if (containersToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedContainersEvent(containersToCleanup)); > } > List appsToCleanup = > response.getApplicationsToCleanupList(); > //Only start tracking for keepAlive on FINISH_APP > trackAppsForKeepAlive(appsToCleanup); > if (appsToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedAppsEvent(appsToCleanup)); > } > } catch (Throwable e) { > // TODO Better error handling. Thread can die with the rest of the > // NM still running. > LOG.error("Caught exception in status-updater", e); > } > } > } > }.start(); > } > private NodeStatus getNodeStatus() { > NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); > nodeStatus.setNodeId(this.nodeId); > int numActiveContainers = 0; > List containersStatuses = new > ArrayList(); > for (Iterator> i = > this.context.getContainers().entrySet().iterator(); i.hasNext();) { > Entry e = i.next(); > ContainerId containerId = e.getKey(); > Container container = e.getValue(); > // Clone the container to send it to the RM > org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = > container.cloneAndGetContainerStatus(); > containersStatuses.add(containerStatus); > ++numActiveContainers; > LOG.info("Sending out status for container: " + containerStatus); > {color:red} > // Here is the part that removes the completed containers. > if (containerStatus.getState() == ContainerState.COMPLETE) { >
[jira] [Updated] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-101: --- Attachment: YARN-101.5.patch > If the heartbeat message loss, the nodestatus info of complete container > will loss too. > > > Key: YARN-101 > URL: https://issues.apache.org/jira/browse/YARN-101 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: suse. >Reporter: xieguiming >Assignee: Xuan Gong >Priority: Minor > Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, > YARN-101.4.patch, YARN-101.5.patch > > > see the red color: > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java > protected void startStatusUpdater() { > new Thread("Node Status Updater") { > @Override > @SuppressWarnings("unchecked") > public void run() { > int lastHeartBeatID = 0; > while (!isStopped) { > // Send heartbeat > try { > synchronized (heartbeatMonitor) { > heartbeatMonitor.wait(heartBeatInterval); > } > {color:red} > // Before we send the heartbeat, we get the NodeStatus, > // whose method removes completed containers. > NodeStatus nodeStatus = getNodeStatus(); > {color} > nodeStatus.setResponseId(lastHeartBeatID); > > NodeHeartbeatRequest request = recordFactory > .newRecordInstance(NodeHeartbeatRequest.class); > request.setNodeStatus(nodeStatus); > {color:red} >// But if the nodeHeartbeat fails, we've already removed the > containers away to know about it. We aren't handling a nodeHeartbeat failure > case here. > HeartbeatResponse response = > resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); >{color} > if (response.getNodeAction() == NodeAction.SHUTDOWN) { > LOG > .info("Recieved SHUTDOWN signal from Resourcemanager as > part of heartbeat," + > " hence shutting down."); > NodeStatusUpdaterImpl.this.stop(); > break; > } > if (response.getNodeAction() == NodeAction.REBOOT) { > LOG.info("Node is out of sync with ResourceManager," > + " hence rebooting."); > NodeStatusUpdaterImpl.this.reboot(); > break; > } > lastHeartBeatID = response.getResponseId(); > List containersToCleanup = response > .getContainersToCleanupList(); > if (containersToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedContainersEvent(containersToCleanup)); > } > List appsToCleanup = > response.getApplicationsToCleanupList(); > //Only start tracking for keepAlive on FINISH_APP > trackAppsForKeepAlive(appsToCleanup); > if (appsToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedAppsEvent(appsToCleanup)); > } > } catch (Throwable e) { > // TODO Better error handling. Thread can die with the rest of the > // NM still running. > LOG.error("Caught exception in status-updater", e); > } > } > } > }.start(); > } > private NodeStatus getNodeStatus() { > NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); > nodeStatus.setNodeId(this.nodeId); > int numActiveContainers = 0; > List containersStatuses = new > ArrayList(); > for (Iterator> i = > this.context.getContainers().entrySet().iterator(); i.hasNext();) { > Entry e = i.next(); > ContainerId containerId = e.getKey(); > Container container = e.getValue(); > // Clone the container to send it to the RM > org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = > container.cloneAndGetContainerStatus(); > containersStatuses.add(containerStatus); > ++numActiveContainers; > LOG.info("Sending out status for container: " + containerStatus); > {color:red} > // Here is the part that removes the completed containers. > if (containerStatus.getState() == ContainerState.COMPLETE) { > // Remove > i.remove(); > {color} > LOG.info("Removed completed container " + containerId); > } > } > nodeStatus.setContainersStatuses(containersStatuses); > LOG.debug(this.nodeId + " sendin
[jira] [Commented] (YARN-527) Local filecache mkdir fails
[ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620239#comment-13620239 ] Vinod Kumar Vavilapalli commented on YARN-527: -- Is there any difference in how NodeManager tried to create the dir and your manual creation? Like the user running NM and user who manually created the dir? Can you reproduce this? If we can find out exactly why NM couldn't create it automatically, then we can do something about it. > Local filecache mkdir fails > --- > > Key: YARN-527 > URL: https://issues.apache.org/jira/browse/YARN-527 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.0-alpha > Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes > and six worker nodes. >Reporter: Knut O. Hellan >Priority: Minor > Attachments: yarn-site.xml > > > Jobs failed with no other explanation than this stack trace: > 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag > nostics report from attempt_1364591875320_0017_m_00_0: > java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 > 55400878397 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Manually creating the directory worked. This behavior was common to at least > several nodes in the cluster. > The situation was resolved by removing and recreating all > /disk?/yarn/local/filecache directories on all nodes. > It is unclear whether Yarn struggled with the number of files or if there > were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-529) Succeeded MR job is retried by RM if finishApplicationMaster() call fails
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-529: Issue Type: Improvement (was: Sub-task) Parent: (was: YARN-128) > Succeeded MR job is retried by RM if finishApplicationMaster() call fails > - > > Key: YARN-529 > URL: https://issues.apache.org/jira/browse/YARN-529 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > > MR app master will clean staging dir, if the job is already succeeded and > asked to reboot. If the finishApplicationMaster call fails, RM will consider > this job unfinished and launch further attempts, further attempts will fail > because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-529) Succeeded MR job is retried by RM if finishApplicationMaster() call fails
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620235#comment-13620235 ] Bikas Saha commented on YARN-529: - This problem is related to RM Restart but independent of it. Even without restart, if for some reason, during MR app master shutdown, if unregister from RM fails, then the app master will continue and delete staging dir etc. Since RM did not get an unregister, it will retry the MR app and all subsequent attempts will fail. > Succeeded MR job is retried by RM if finishApplicationMaster() call fails > - > > Key: YARN-529 > URL: https://issues.apache.org/jira/browse/YARN-529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > > MR app master will clean staging dir, if the job is already succeeded and > asked to reboot. If the finishApplicationMaster call fails, RM will consider > this job unfinished and launch further attempts, further attempts will fail > because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-529) Succeeded MR job is retried by RM if finishApplicationMaster() call fails
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-529: Summary: Succeeded MR job is retried by RM if finishApplicationMaster() call fails (was: Succeeded RM job is retried by RM if finishApplicationMaster() call fails) > Succeeded MR job is retried by RM if finishApplicationMaster() call fails > - > > Key: YARN-529 > URL: https://issues.apache.org/jira/browse/YARN-529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > > MR app master will clean staging dir, if the job is already succeeded and > asked to reboot. If the finishApplicationMaster call fails, RM will consider > this job unfinished and launch further attempts, further attempts will fail > because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-529) Succeeded RM job is retried by RM if finishApplicationMaster() call fails
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-529: Summary: Succeeded RM job is retried by RM if finishApplicationMaster() call fails (was: MR app master clean staging dir when reboot command sent from RM while the MR job succeeded) > Succeeded RM job is retried by RM if finishApplicationMaster() call fails > - > > Key: YARN-529 > URL: https://issues.apache.org/jira/browse/YARN-529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > > MR app master will clean staging dir, if the job is already succeeded and > asked to reboot. If the finishApplicationMaster call fails, RM will consider > this job unfinished and launch further attempts, further attempts will fail > because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620230#comment-13620230 ] Bikas Saha commented on YARN-529: - By 1) you mean let RM accept finishApplicationAttempt() from the last attempt? > MR app master clean staging dir when reboot command sent from RM while the MR > job succeeded > --- > > Key: YARN-529 > URL: https://issues.apache.org/jira/browse/YARN-529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > > MR app master will clean staging dir, if the job is already succeeded and > asked to reboot. If the finishApplicationMaster call fails, RM will consider > this job unfinished and launch further attempts, further attempts will fail > because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-529: Description: MR app master will clean staging dir, if the job is already succeeded and asked to reboot. If the finishApplicationMaster call fails, RM will consider this job unfinished and launch further attempts, further attempts will fail because staging dir is cleaned (was: MR app master will clean staging dir, if the job is already succeeded and asked to reboot. RM will consider this job unsuccessful and launch further attempts, further attempts will fail because staging dir is cleaned) > MR app master clean staging dir when reboot command sent from RM while the MR > job succeeded > --- > > Key: YARN-529 > URL: https://issues.apache.org/jira/browse/YARN-529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > > MR app master will clean staging dir, if the job is already succeeded and > asked to reboot. If the finishApplicationMaster call fails, RM will consider > this job unfinished and launch further attempts, further attempts will fail > because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-531) RM nodes page should show time-since-last-heartbeat instead of absolute last-heartbeat time
Vinod Kumar Vavilapalli created YARN-531: Summary: RM nodes page should show time-since-last-heartbeat instead of absolute last-heartbeat time Key: YARN-531 URL: https://issues.apache.org/jira/browse/YARN-531 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Absolute last-heartbeat time is absolutely useless ;) We need to replace it with time since last heartbeat. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-531) RM nodes page should show time-since-last-heartbeat instead of absolute last-heartbeat time
[ https://issues.apache.org/jira/browse/YARN-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-531: - Labels: usability (was: ) > RM nodes page should show time-since-last-heartbeat instead of absolute > last-heartbeat time > --- > > Key: YARN-531 > URL: https://issues.apache.org/jira/browse/YARN-531 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Labels: usability > > Absolute last-heartbeat time is absolutely useless ;) We need to replace it > with time since last heartbeat. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620170#comment-13620170 ] Vinod Kumar Vavilapalli commented on YARN-528: -- bq. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. We have to make a call on this, don't think we explicitly took that decision yet. That said, I am inclined to throw it away but there were a couple of reasons why we put this (like being able to pass through unindentified fields for e.g. from new RM to new NM via old AM). I would like a day or two to dig into those with knowledgeable folks offline. Thanks for your patience. Oh, and let's separate the tickets into MR and YARN only changes please - there isn't any pain as they are all orthogonal changes. > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: YARN-528.txt, YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620163#comment-13620163 ] Hadoop QA commented on YARN-117: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576620/YARN-117.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 28 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 33 warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup org.apache.hadoop.mapreduce.security.ssl.TestEncryptedShuffle org.apache.hadoop.mapred.TestNetworkedJob org.apache.hadoop.mapred.TestClusterMRNotification org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner org.apache.hadoop.mapred.TestMiniMRClasspath org.apache.hadoop.mapred.TestBlockLimits org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers org.apache.hadoop.mapred.TestMiniMRChildTask org.apache.hadoop.mapreduce.security.TestMRCredentials org.apache.hadoop.mapreduce.v2.TestNonExistentJob org.apache.hadoop.mapreduce.v2.TestRMNMInfo org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapred.TestJobCleanup org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.mapred.TestReduceFetchFromPartialMem org.apache.hadoop.mapred.TestMerge org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapreduce.TestChild org.apache.hadoop.mapred.TestJobName org.apache.hadoop.mapred.TestLazyOutput org.apache.hadoop.mapreduce.security.TestBinaryTokenFile org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapred.TestClusterMapReduceTestCase org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter org.apache.hadoop.ipc.TestSocketFactory org.apache.hadoop.mapred.TestJobSysDirWithDFS org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/648//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/648//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/648//artifact/tru
[jira] [Resolved] (YARN-442) The ID classes should be immutable
[ https://issues.apache.org/jira/browse/YARN-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-442. -- Resolution: Duplicate Assignee: (was: Xuan Gong) YARN-528 is fixing this, closing as duplicate. > The ID classes should be immutable > -- > > Key: YARN-442 > URL: https://issues.apache.org/jira/browse/YARN-442 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Siddharth Seth > > ApplicationId, ApplicationAttemptId, ContainerId should be immutable. That > should allow for a simpler implementation as well as remove synchronization. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-528: - Issue Type: Sub-task (was: Improvement) Parent: YARN-386 > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: YARN-528.txt, YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities
[ https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-382: - Fix Version/s: 2.0.5-beta > SchedulerUtils improve way normalizeRequest sets the resource capabilities > -- > > Key: YARN-382 > URL: https://issues.apache.org/jira/browse/YARN-382 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Thomas Graves >Assignee: Zhijie Shen > Fix For: 2.0.5-beta > > Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch > > > In YARN-370, we changed it from setting the capability to directly setting > memory and cores: > -ask.setCapability(normalized); > +ask.getCapability().setMemory(normalized.getMemory()); > +ask.getCapability().setVirtualCores(normalized.getVirtualCores()); > We did this because it is directly setting the values in the original > resource object passed in when the AM gets allocated and without it the AM > doesn't get the resource normalized correctly in the submission context. See > YARN-370 for more details. > I think we should find a better way of doing this long term, one so we don't > have to keep adding things there when new resources are added, two because > its a bit confusing as to what its doing and prone to someone accidentally > breaking it in the future again. Something closer to what Arun suggested in > YARN-370 would be better but we need to make sure all the places work and get > some more testing on it before putting it in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities
[ https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620097#comment-13620097 ] Hudson commented on YARN-382: - Integrated in Hadoop-trunk-Commit #3549 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3549/]) YARN-382. SchedulerUtils improve way normalizeRequest sets the resource capabilities (Zhijie Shen via bikas) (Revision 1463653) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463653 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java > SchedulerUtils improve way normalizeRequest sets the resource capabilities > -- > > Key: YARN-382 > URL: https://issues.apache.org/jira/browse/YARN-382 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Thomas Graves >Assignee: Zhijie Shen > Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch > > > In YARN-370, we changed it from setting the capability to directly setting > memory and cores: > -ask.setCapability(normalized); > +ask.getCapability().setMemory(normalized.getMemory()); > +ask.getCapability().setVirtualCores(normalized.getVirtualCores()); > We did this because it is directly setting the values in the original > resource object passed in when the AM gets allocated and without it the AM > doesn't get the resource normalized correctly in the submission context. See > YARN-370 for more details. > I think we should find a better way of doing this long term, one so we don't > have to keep adding things there when new resources are added, two because > its a bit confusing as to what its doing and prone to someone accidentally > breaking it in the future again. Something closer to what Arun suggested in > YARN-370 would be better but we need to make sure all the places work and get > some more testing on it before putting it in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620096#comment-13620096 ] Hadoop QA commented on YARN-530: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576617/YARN-530.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 33 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/647//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/647//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/647//console This message is automatically generated. > Define Service model strictly, implement AbstractService for robust > subclassing, migrate yarn-common services > - > > Key: YARN-530 > URL: https://issues.apache.org/jira/browse/YARN-530 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117changes.pdf, YARN-530.patch > > > # Extend the YARN {{Service}} interface as discussed in YARN-117 > # Implement the changes in {{AbstractService}} and {{FilterService}}. > # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620092#comment-13620092 ] jian he commented on YARN-529: -- several solutions: 1. Let RM accept old attempts. In current case, RM will raise exception because unrecognized attempts and think the job unsuccessful 2. Only clean staging dir after AM successfully unregister with RM. We can use a flag to indicate or modify state machine when receive JOB_AM_REBOOT, transition from SUCCEEDED to REBOOT. The potential problem is that, when job transition to SUCCEEDED state, some job success metrics stuff has already been triggered. > MR app master clean staging dir when reboot command sent from RM while the MR > job succeeded > --- > > Key: YARN-529 > URL: https://issues.apache.org/jira/browse/YARN-529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: jian he >Assignee: jian he > > MR app master will clean staging dir, if the job is already succeeded and > asked to reboot. RM will consider this job unsuccessful and launch further > attempts, further attempts will fail because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620091#comment-13620091 ] Bikas Saha commented on YARN-193: - Also, why are there so many normalize functions and why are we creating a new Resource object every time we normalize? We should fix this in a different jira though. > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, > YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, > YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117.patch This is the across-all-yarn-projects patch (plus HADOOP-9447) just to show what the combined patch looks and tests like. YARN-530 contains the changes to yarn-common which should be the first step. (This patch contains those) > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and ignore may not be what is desired. > *Proposed:* during {{stop()}} any exception by a listener is caught and > discarded, t
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620066#comment-13620066 ] Bikas Saha commented on YARN-193: - Can we check that we are getting the expected exception and not some other one? {code} +try { + rmService.submitApplication(submitRequest); + Assert.fail("Application submission should fail because"); +} catch (YarnRemoteException e) { + // Exception is expected +} + } {code} Setting the same config twice? In second set, why not use a -ve value instead of the DISABLE value? Its not clear whether we want to disable check or set a -ve value. same for others. {code} +conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 0); +conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, +ResourceCalculator.DISABLE_RESOURCELIMIT_CHECK); +try { + resourceManager.init(conf); + fail("Exception is expected because the min memory allocation is" + + " non-positive."); +} catch (YarnException e) { + // Exception is expected. {code} Lets also add a test for case when memory is more than max. Normalize should always reduce that to max. Same for DRF {code} +// max is not a multiple of min +maxResource = Resources.createResource(maxMemory - 10, 0); +ask.setCapability(Resources.createResource(maxMemory - 100)); +// multiple of minMemory > maxMemory, then reduce to maxMemory +SchedulerUtils.normalizeRequest(ask, resourceCalculator, null, +minResource, maxResource); +assertEquals(maxResource.getMemory(), ask.getCapability().getMemory()); } {code} Rename testAppSubmitError() to show that its testing invalid resource request? TestAMRMClient. Why is this change needed? {code} +amResource.setMemory( +YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB); +amContainer.setResource(amResource); {code} Dont we need to throw? {code} + } catch (InvalidResourceRequestException e) { +LOG.info("Resource request was not able to be alloacated for" + +" application attempt " + appAttemptId + " because it" + +" failed to pass the validation. " + e.getMessage()); +RPCUtil.getRemoteException(e); + } {code} typo {code} +// validate scheduler vcors allocation setting {code} This will need to be rebased after YARN-382 which I am going to commit shortly. I am fine with requiring that a max allocation limit be set. We should also make sure that max allocation from config can be matched by at least 1 machine in the cluster. That should be a different jira. IMO, Normalization should be called only inside the scheduler. It is an artifact of the scheduler logic. Nothing in the RM requires resources to be normalized to a multiple of min. Only the scheduler needs it to makes its life easier and it could choose to not do so. > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, > YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, > YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-530: Attachment: YARN-530.patch This is the subset of YARN-117 for yarn-common > Define Service model strictly, implement AbstractService for robust > subclassing, migrate yarn-common services > - > > Key: YARN-530 > URL: https://issues.apache.org/jira/browse/YARN-530 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117changes.pdf, YARN-530.patch > > > # Extend the YARN {{Service}} interface as discussed in YARN-117 > # Implement the changes in {{AbstractService}} and {{FilterService}}. > # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities
[ https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620057#comment-13620057 ] Bikas Saha commented on YARN-382: - +1 looks good to me. > SchedulerUtils improve way normalizeRequest sets the resource capabilities > -- > > Key: YARN-382 > URL: https://issues.apache.org/jira/browse/YARN-382 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Thomas Graves >Assignee: Zhijie Shen > Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch > > > In YARN-370, we changed it from setting the capability to directly setting > memory and cores: > -ask.setCapability(normalized); > +ask.getCapability().setMemory(normalized.getMemory()); > +ask.getCapability().setVirtualCores(normalized.getVirtualCores()); > We did this because it is directly setting the values in the original > resource object passed in when the AM gets allocated and without it the AM > doesn't get the resource normalized correctly in the submission context. See > YARN-370 for more details. > I think we should find a better way of doing this long term, one so we don't > have to keep adding things there when new resources are added, two because > its a bit confusing as to what its doing and prone to someone accidentally > breaking it in the future again. Something closer to what Arun suggested in > YARN-370 would be better but we need to make sure all the places work and get > some more testing on it before putting it in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-530: Attachment: YARN-117changes.pdf this is an overview of the changes, with explanations > Define Service model strictly, implement AbstractService for robust > subclassing, migrate yarn-common services > - > > Key: YARN-530 > URL: https://issues.apache.org/jira/browse/YARN-530 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117changes.pdf > > > # Extend the YARN {{Service}} interface as discussed in YARN-117 > # Implement the changes in {{AbstractService}} and {{FilterService}}. > # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-120) Make yarn-common services robust
[ https://issues.apache.org/jira/browse/YARN-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved YARN-120. - Resolution: Duplicate Fix Version/s: 3.0.0 Superceded by YARN-530 > Make yarn-common services robust > > > Key: YARN-120 > URL: https://issues.apache.org/jira/browse/YARN-120 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Steve Loughran >Assignee: Steve Loughran > Labels: yarn > Fix For: 3.0.0 > > Attachments: MAPREDUCE-4014.patch > > > Review the yarn common services ({{CompositeService}}, > {{AbstractLivelinessMonitor}} and make their service startup _and especially > shutdown_ more robust against out-of-lifecycle invocation and partially > complete initialization. > Write tests for these where possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-121) Yarn services to throw a YarnException on invalid state changs
[ https://issues.apache.org/jira/browse/YARN-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved YARN-121. - Resolution: Duplicate Fix Version/s: 3.0.0 Superceded by YARN-530 > Yarn services to throw a YarnException on invalid state changs > -- > > Key: YARN-121 > URL: https://issues.apache.org/jira/browse/YARN-121 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 3.0.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > the {{EnsureCurrentState()}} checks of services throw an > {{IllegalStateException}} if the state is wrong. If this was changed to > {{YarnException}}. wrapper services such as CompositeService could relay this > direct, instead of wrapping it in their own. > Time to implement mainly in changing the lifecycle test cases of > MAPREDUCE-3939 subtasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reassigned YARN-530: --- Assignee: Steve Loughran > Define Service model strictly, implement AbstractService for robust > subclassing, migrate yarn-common services > - > > Key: YARN-530 > URL: https://issues.apache.org/jira/browse/YARN-530 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Steve Loughran >Assignee: Steve Loughran > > # Extend the YARN {{Service}} interface as discussed in YARN-117 > # Implement the changes in {{AbstractService}} and {{FilterService}}. > # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
Steve Loughran created YARN-530: --- Summary: Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he reassigned YARN-529: Assignee: jian he > MR app master clean staging dir when reboot command sent from RM while the MR > job succeeded > --- > > Key: YARN-529 > URL: https://issues.apache.org/jira/browse/YARN-529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: jian he >Assignee: jian he > > MR app master will clean staging dir, if the job is already succeeded and > asked to reboot. RM will consider this job unsuccessful and launch further > attempts, further attempts will fail because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he updated YARN-529: - Description: MR app master will clean staging dir, if the job is already succeeded and asked to reboot. RM will consider this job unsuccessful and launch further attempts, further attempts will fail because staging dir is cleaned > MR app master clean staging dir when reboot command sent from RM while the MR > job succeeded > --- > > Key: YARN-529 > URL: https://issues.apache.org/jira/browse/YARN-529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: jian he > > MR app master will clean staging dir, if the job is already succeeded and > asked to reboot. RM will consider this job unsuccessful and launch further > attempts, further attempts will fail because staging dir is cleaned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-122) CompositeService should clone the Configurations it passes to children
[ https://issues.apache.org/jira/browse/YARN-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-122: Priority: Minor (was: Major) > CompositeService should clone the Configurations it passes to children > -- > > Key: YARN-122 > URL: https://issues.apache.org/jira/browse/YARN-122 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Steve Loughran >Priority: Minor > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > {{CompositeService.init(Configuration)}} saves the configuration passed in > *and* passes the same instance down to all managed services. This means a > change in the configuration of one child could propagate to all the others. > Unless this is desired, the configuration should be cloned for each child. > Fast and easy fix; tests can be added to those coming in MAPREDUCE-4014 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded
[ https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian he updated YARN-529: - Summary: MR app master clean staging dir when reboot command sent from RM while the MR job succeeded (was: IF RM rebooted when MR job succeeded ) > MR app master clean staging dir when reboot command sent from RM while the MR > job succeeded > --- > > Key: YARN-529 > URL: https://issues.apache.org/jira/browse/YARN-529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: jian he > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-122) CompositeService should clone the Configurations it passes to children
[ https://issues.apache.org/jira/browse/YARN-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620024#comment-13620024 ] Steve Loughran commented on YARN-122: - This requires {{Configuration}} to implement {{clone()}} as a public method, so that any subclass of it, such as {{YarnConfiguration}} will still be passed down to the children > CompositeService should clone the Configurations it passes to children > -- > > Key: YARN-122 > URL: https://issues.apache.org/jira/browse/YARN-122 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Steve Loughran > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > {{CompositeService.init(Configuration)}} saves the configuration passed in > *and* passes the same instance down to all managed services. This means a > change in the configuration of one child could propagate to all the others. > Unless this is desired, the configuration should be cloned for each child. > Fast and easy fix; tests can be added to those coming in MAPREDUCE-4014 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-529) IF RM rebooted when MR job succeeded
jian he created YARN-529: Summary: IF RM rebooted when MR job succeeded Key: YARN-529 URL: https://issues.apache.org/jira/browse/YARN-529 Project: Hadoop YARN Issue Type: Sub-task Reporter: jian he -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality
[ https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620008#comment-13620008 ] Bikas Saha commented on YARN-392: - Yes YARN-398 but not the proposal currently in there. The alternative proposal is to have a new method in AM RM protocol using which the AM can blacklist nodes globally for all tasks (at all priorities) for that app. > Make it possible to schedule to specific nodes without dropping locality > > > Key: YARN-392 > URL: https://issues.apache.org/jira/browse/YARN-392 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Sandy Ryza > Attachments: YARN-392-1.patch, YARN-392.patch > > > Currently its not possible to specify scheduling requests for specific nodes > and nowhere else. The RM automatically relaxes locality to rack and * and > assigns non-specified machines to the app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619989#comment-13619989 ] Hadoop QA commented on YARN-528: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576592/YARN-528.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 50 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/646//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/646//console This message is automatically generated. > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: YARN-528.txt, YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619957#comment-13619957 ] Zhijie Shen commented on YARN-193: -- {quote} I am not sure if we should allow disabling of the max memory and max vcores setting. Was it supported earlier or does the patch introduce this support? {quote} Yes, the patch introduces the support. It is already there in your previous patch. I inherit it and and some description in yarn-default.xml. I'm fine with whether the function need to be supported or not. One risk I can image if the function is supported is that AM memory can exceeds "yarn.nodemanager.resource.memory-mb" when DISABLE_RESOURCELIMIT_CHECK is set. Then, the problem described in YARN-389 will occur. {quote} Question - should normalization of resource requests be done inside the scheduler or in the ApplicationMasterService itself which handles the allocate call? {quote} I think it should be better to do normalization outside allocate, because allocate is not only called in ApplicationMasterService and it is not necessary that normalize is called every time when allocate is called. For example, RMAppAttemptImpl#ScheduleTransition#transition doesn't require to do normalization because the resource has been validated during the submission stage. For another example, RMAppAttemptImpl#AMContainerAllocatedTransition#transition supplies an empty ask. {quote} Unrelated to this patch but when throwing/logging errors related to configs, we should always point to the configuration property to let the user know which property needs to be changed. Please file a separate jira for the above. {quote} I'll do that, and modify the log information when exception is thrown in this patch. {quote} For InvalidResourceRequestException, missing javadocs for class description. {quote} I'll add the description. {quote} If maxMemory or maxVcores is set to -1, what will happen when normalize() is called? {quote} The normalized value has not upper bound. > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, > YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, > YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated YARN-528: - Attachment: YARN-528.txt Upmerged > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: YARN-528.txt, YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619911#comment-13619911 ] Robert Joseph Evans commented on YARN-528: -- The build failed, because it needs to be upmerged, again :( > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619825#comment-13619825 ] Hudson commented on YARN-475: - Integrated in Hadoop-Mapreduce-trunk #1389 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/]) YARN-475. Remove a unused constant in the public API - ApplicationConstants.AM_APP_ATTEMPT_ID_ENV. Contributed by Hitesh Shah. (Revision 1463033) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463033 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java > Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in > an AM's environment > --- > > Key: YARN-475 > URL: https://issues.apache.org/jira/browse/YARN-475 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Fix For: 2.0.5-beta > > Attachments: YARN-475.1.patch > > > AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive > the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-447) applicationComparator improvement for CS
[ https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619824#comment-13619824 ] Hudson commented on YARN-447: - Integrated in Hadoop-Mapreduce-trunk #1389 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/]) YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator in ApplicationId. Contributed by Nemon Lou. (Revision 1463405) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463405 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java > applicationComparator improvement for CS > > > Key: YARN-447 > URL: https://issues.apache.org/jira/browse/YARN-447 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: nemon lou >Assignee: nemon lou >Priority: Minor > Fix For: 2.0.5-beta > > Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, > YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch > > > Now the compare code is : > return a1.getApplicationId().getId() - a2.getApplicationId().getId(); > Will be replaced with : > return a1.getApplicationId().compareTo(a2.getApplicationId()); > This will bring some benefits: > 1,leave applicationId compare logic to ApplicationId class; > 2,In future's HA mode,cluster time stamp may change,ApplicationId class > already takes care of this condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
[ https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619821#comment-13619821 ] Hudson commented on YARN-516: - Integrated in Hadoop-Mapreduce-trunk #1389 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/]) YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. Contributed by Andrew Wang. (Revision 1463362) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463362 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java > TestContainerLocalizer.testContainerLocalizerMain is failing > > > Key: YARN-516 > URL: https://issues.apache.org/jira/browse/YARN-516 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Andrew Wang > Fix For: 2.0.5-beta > > Attachments: YARN-516.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
[ https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619822#comment-13619822 ] Hudson commented on YARN-524: - Integrated in Hadoop-Mapreduce-trunk #1389 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/]) YARN-524 TestYarnVersionInfo failing if generated properties doesn't include an SVN URL (Revision 1463300) Result = SUCCESS stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463300 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java > TestYarnVersionInfo failing if generated properties doesn't include an SVN URL > -- > > Key: YARN-524 > URL: https://issues.apache.org/jira/browse/YARN-524 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Affects Versions: 3.0.0 > Environment: OS/X with branch off github >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 3.0.0 > > Attachments: YARN-524.patch > > > {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call > returns {{Unknown}} when that is the value inserted into the property file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619817#comment-13619817 ] Hudson commented on YARN-309: - Integrated in Hadoop-Mapreduce-trunk #1389 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/]) YARN-309. Changed NodeManager to obtain heart-beat interval from the ResourceManager. Contributed by Xuan Gong. (Revision 1463346) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463346 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java > Make RM provide heartbeat interval to NM > > > Key: YARN-309 > URL: https://issues.apache.org/jira/browse/YARN-309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.0.5-beta > > Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, > YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, > YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-527) Local filecache mkdir fails
[ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619797#comment-13619797 ] Knut O. Hellan commented on YARN-527: - Digging through the code, it looks to me like the native Java File.mkdirs is used to actually create the directory and it will not give information about why it failed. If that is the case then I guess this issue is actually a feature request that yarn should be better at cleaning up old file caches so that this situation will not happen. > Local filecache mkdir fails > --- > > Key: YARN-527 > URL: https://issues.apache.org/jira/browse/YARN-527 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.0-alpha > Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes > and six worker nodes. >Reporter: Knut O. Hellan >Priority: Minor > Attachments: yarn-site.xml > > > Jobs failed with no other explanation than this stack trace: > 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag > nostics report from attempt_1364591875320_0017_m_00_0: > java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 > 55400878397 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Manually creating the directory worked. This behavior was common to at least > several nodes in the cluster. > The situation was resolved by removing and recreating all > /disk?/yarn/local/filecache directories on all nodes. > It is unclear whether Yarn struggled with the number of files or if there > were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality
[ https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619778#comment-13619778 ] Thomas Graves commented on YARN-392: Bikas when you say creating an API for blacklisting a set of nodes are you basically referring to YARN-398 or something else? > Make it possible to schedule to specific nodes without dropping locality > > > Key: YARN-392 > URL: https://issues.apache.org/jira/browse/YARN-392 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Sandy Ryza > Attachments: YARN-392-1.patch, YARN-392.patch > > > Currently its not possible to specify scheduling requests for specific nodes > and nowhere else. The RM automatically relaxes locality to rack and * and > assigns non-specified machines to the app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-447) applicationComparator improvement for CS
[ https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619768#comment-13619768 ] Hudson commented on YARN-447: - Integrated in Hadoop-Hdfs-trunk #1362 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/]) YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator in ApplicationId. Contributed by Nemon Lou. (Revision 1463405) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463405 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java > applicationComparator improvement for CS > > > Key: YARN-447 > URL: https://issues.apache.org/jira/browse/YARN-447 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: nemon lou >Assignee: nemon lou >Priority: Minor > Fix For: 2.0.5-beta > > Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, > YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch > > > Now the compare code is : > return a1.getApplicationId().getId() - a2.getApplicationId().getId(); > Will be replaced with : > return a1.getApplicationId().compareTo(a2.getApplicationId()); > This will bring some benefits: > 1,leave applicationId compare logic to ApplicationId class; > 2,In future's HA mode,cluster time stamp may change,ApplicationId class > already takes care of this condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
[ https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619766#comment-13619766 ] Hudson commented on YARN-524: - Integrated in Hadoop-Hdfs-trunk #1362 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/]) YARN-524 TestYarnVersionInfo failing if generated properties doesn't include an SVN URL (Revision 1463300) Result = FAILURE stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463300 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java > TestYarnVersionInfo failing if generated properties doesn't include an SVN URL > -- > > Key: YARN-524 > URL: https://issues.apache.org/jira/browse/YARN-524 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Affects Versions: 3.0.0 > Environment: OS/X with branch off github >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 3.0.0 > > Attachments: YARN-524.patch > > > {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call > returns {{Unknown}} when that is the value inserted into the property file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
[ https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619765#comment-13619765 ] Hudson commented on YARN-516: - Integrated in Hadoop-Hdfs-trunk #1362 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/]) YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. Contributed by Andrew Wang. (Revision 1463362) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463362 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java > TestContainerLocalizer.testContainerLocalizerMain is failing > > > Key: YARN-516 > URL: https://issues.apache.org/jira/browse/YARN-516 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Andrew Wang > Fix For: 2.0.5-beta > > Attachments: YARN-516.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619761#comment-13619761 ] Hudson commented on YARN-309: - Integrated in Hadoop-Hdfs-trunk #1362 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/]) YARN-309. Changed NodeManager to obtain heart-beat interval from the ResourceManager. Contributed by Xuan Gong. (Revision 1463346) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463346 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java > Make RM provide heartbeat interval to NM > > > Key: YARN-309 > URL: https://issues.apache.org/jira/browse/YARN-309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.0.5-beta > > Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, > YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, > YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619750#comment-13619750 ] Hadoop QA commented on YARN-528: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576553/YARN-528.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 49 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/645//console This message is automatically generated. > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans reassigned YARN-528: Assignee: Robert Joseph Evans > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated YARN-528: - Attachment: YARN-528.txt This patch contains changes to both Map/Reduce IDs as well as YARN APIs. I don't really want to split them up right now, but I am happy to file a separate JIRA for tracking purposes if the community decides this is a direction we want to go in. > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Robert Joseph Evans > Attachments: YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-528) Make IDs read only
Robert Joseph Evans created YARN-528: Summary: Make IDs read only Key: YARN-528 URL: https://issues.apache.org/jira/browse/YARN-528 Project: Hadoop YARN Issue Type: Improvement Reporter: Robert Joseph Evans I really would like to rip out most if not all of the abstraction layer that sits in-between Protocol Buffers, the RPC, and the actual user code. We have no plans to support any other serialization type, and the abstraction layer just, makes it more difficult to change protocols, makes changing them more error prone, and slows down the objects themselves. Completely doing that is a lot of work. This JIRA is a first step towards that. It makes the various ID objects immutable. If this patch is wel received I will try to go through other objects/classes of objects and update them in a similar way. This is probably the last time we will be able to make a change like this before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-525) make CS node-locality-delay refreshable
[ https://issues.apache.org/jira/browse/YARN-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-525: --- Issue Type: Improvement (was: Bug) Summary: make CS node-locality-delay refreshable (was: yarn.scheduler.capacity.node-locality-delay doesn't change with rmadmin -refreshQueues) > make CS node-locality-delay refreshable > --- > > Key: YARN-525 > URL: https://issues.apache.org/jira/browse/YARN-525 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.0.3-alpha, 0.23.7 >Reporter: Thomas Graves > > the config yarn.scheduler.capacity.node-locality-delay doesn't change when > you change the value in capacity_scheduler.xml and then run yarn rmadmin > -refreshQueues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-447) applicationComparator improvement for CS
[ https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619697#comment-13619697 ] Hudson commented on YARN-447: - Integrated in Hadoop-Yarn-trunk #173 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/173/]) YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator in ApplicationId. Contributed by Nemon Lou. (Revision 1463405) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463405 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java > applicationComparator improvement for CS > > > Key: YARN-447 > URL: https://issues.apache.org/jira/browse/YARN-447 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: nemon lou >Assignee: nemon lou >Priority: Minor > Fix For: 2.0.5-beta > > Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, > YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch > > > Now the compare code is : > return a1.getApplicationId().getId() - a2.getApplicationId().getId(); > Will be replaced with : > return a1.getApplicationId().compareTo(a2.getApplicationId()); > This will bring some benefits: > 1,leave applicationId compare logic to ApplicationId class; > 2,In future's HA mode,cluster time stamp may change,ApplicationId class > already takes care of this condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
[ https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619695#comment-13619695 ] Hudson commented on YARN-524: - Integrated in Hadoop-Yarn-trunk #173 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/173/]) YARN-524 TestYarnVersionInfo failing if generated properties doesn't include an SVN URL (Revision 1463300) Result = SUCCESS stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463300 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java > TestYarnVersionInfo failing if generated properties doesn't include an SVN URL > -- > > Key: YARN-524 > URL: https://issues.apache.org/jira/browse/YARN-524 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Affects Versions: 3.0.0 > Environment: OS/X with branch off github >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 3.0.0 > > Attachments: YARN-524.patch > > > {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call > returns {{Unknown}} when that is the value inserted into the property file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
[ https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619694#comment-13619694 ] Hudson commented on YARN-516: - Integrated in Hadoop-Yarn-trunk #173 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/173/]) YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. Contributed by Andrew Wang. (Revision 1463362) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463362 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java > TestContainerLocalizer.testContainerLocalizerMain is failing > > > Key: YARN-516 > URL: https://issues.apache.org/jira/browse/YARN-516 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Andrew Wang > Fix For: 2.0.5-beta > > Attachments: YARN-516.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619690#comment-13619690 ] Hudson commented on YARN-309: - Integrated in Hadoop-Yarn-trunk #173 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/173/]) YARN-309. Changed NodeManager to obtain heart-beat interval from the ResourceManager. Contributed by Xuan Gong. (Revision 1463346) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1463346 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java > Make RM provide heartbeat interval to NM > > > Key: YARN-309 > URL: https://issues.apache.org/jira/browse/YARN-309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.0.5-beta > > Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, > YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, > YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-527) Local filecache mkdir fails
[ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Knut O. Hellan updated YARN-527: Attachment: yarn-site.xml > Local filecache mkdir fails > --- > > Key: YARN-527 > URL: https://issues.apache.org/jira/browse/YARN-527 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.0-alpha > Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes > and six worker nodes. >Reporter: Knut O. Hellan >Priority: Minor > Attachments: yarn-site.xml > > > Jobs failed with no other explanation than this stack trace: > 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag > nostics report from attempt_1364591875320_0017_m_00_0: > java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 > 55400878397 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Manually creating the directory worked. This behavior was common to at least > several nodes in the cluster. > The situation was resolved by removing and recreating all > /disk?/yarn/local/filecache directories on all nodes. > It is unclear whether Yarn struggled with the number of files or if there > were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-527) Local filecache mkdir fails
Knut O. Hellan created YARN-527: --- Summary: Local filecache mkdir fails Key: YARN-527 URL: https://issues.apache.org/jira/browse/YARN-527 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and six worker nodes. Reporter: Knut O. Hellan Priority: Minor Jobs failed with no other explanation than this stack trace: 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag nostics report from attempt_1364591875320_0017_m_00_0: java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 55400878397 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Manually creating the directory worked. This behavior was common to at least several nodes in the cluster. The situation was resolved by removing and recreating all /disk?/yarn/local/filecache directories on all nodes. It is unclear whether Yarn struggled with the number of files or if there were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira