[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621759#comment-13621759 ] Roger Hoover commented on YARN-412: --- [~acmurthy], I've got the patch back in shape. Can you please review it or let me know what the next step is? > FifoScheduler incorrectly checking for node locality > > > Key: YARN-412 > URL: https://issues.apache.org/jira/browse/YARN-412 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Roger Hoover >Assignee: Roger Hoover >Priority: Minor > Labels: patch > Attachments: YARN-412.patch > > > In the FifoScheduler, the assignNodeLocalContainers method is checking if the > data is local to a node by searching for the nodeAddress of the node in the > set of outstanding requests for the app. This seems to be incorrect as it > should be checking hostname instead. The offending line of code is 455: > application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); > Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses > are a concatenation of hostname and command port (e.g. host1.foo.com:1234) > In the CapacityScheduler, it's done using hostname. See > LeafQueue.assignNodeLocalContainers, line 1129 > application.getResourceRequest(priority, node.getHostName()); > Note that this bug does not affect the actual scheduling decisions made by > the FifoScheduler because even though it incorrect determines that a request > is not local to the node, it will still schedule the request immediately > because it's rack-local. However, this bug may be adversely affecting the > reporting of job status by underreporting the number of tasks that were node > local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621735#comment-13621735 ] Hadoop QA commented on YARN-412: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576914/YARN-412.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/669//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/669//console This message is automatically generated. > FifoScheduler incorrectly checking for node locality > > > Key: YARN-412 > URL: https://issues.apache.org/jira/browse/YARN-412 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Roger Hoover >Assignee: Roger Hoover >Priority: Minor > Labels: patch > Attachments: YARN-412.patch > > > In the FifoScheduler, the assignNodeLocalContainers method is checking if the > data is local to a node by searching for the nodeAddress of the node in the > set of outstanding requests for the app. This seems to be incorrect as it > should be checking hostname instead. The offending line of code is 455: > application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); > Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses > are a concatenation of hostname and command port (e.g. host1.foo.com:1234) > In the CapacityScheduler, it's done using hostname. See > LeafQueue.assignNodeLocalContainers, line 1129 > application.getResourceRequest(priority, node.getHostName()); > Note that this bug does not affect the actual scheduling decisions made by > the FifoScheduler because even though it incorrect determines that a request > is not local to the node, it will still schedule the request immediately > because it's rack-local. However, this bug may be adversely affecting the > reporting of job status by underreporting the number of tasks that were node > local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621733#comment-13621733 ] Hudson commented on YARN-536: - Integrated in Hadoop-trunk-Commit #3560 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3560/]) YARN-536. Removed the unused objects ContainerStatus and ContainerStatus from Container which also don't belong to the container. Contributed by Xuan Gong. (Revision 1464271) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464271 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Container.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java > Remove ContainerStatus, ContainerState from Container api interface as they > will not be called by the container object > -- > > Key: YARN-536 > URL: https://issues.apache.org/jira/browse/YARN-536 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.0.5-beta > > Attachments: YARN-536.1.patch, YARN-536.2.patch > > > Remove containerstate, containerStatus from container interface. They will > not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621720#comment-13621720 ] Vinod Kumar Vavilapalli commented on YARN-536: -- +1, this looks good. Checking it in. > Remove ContainerStatus, ContainerState from Container api interface as they > will not be called by the container object > -- > > Key: YARN-536 > URL: https://issues.apache.org/jira/browse/YARN-536 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-536.1.patch, YARN-536.2.patch > > > Remove containerstate, containerStatus from container interface. They will > not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621709#comment-13621709 ] Hitesh Shah commented on YARN-412: -- @Roger, for future reference ( may not be applicable to this jira ), it is good to leave earlier patch attachments lying around and not delete them when uploading newer patches. This can be used to trace review comments/feedback etc. As for hadoop-common, mvn eclipse:eclipse, it can be ignored for now. It is a known issue with an open jira that has not been addressed yet. > FifoScheduler incorrectly checking for node locality > > > Key: YARN-412 > URL: https://issues.apache.org/jira/browse/YARN-412 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Roger Hoover >Assignee: Roger Hoover >Priority: Minor > Labels: patch > Attachments: YARN-412.patch > > > In the FifoScheduler, the assignNodeLocalContainers method is checking if the > data is local to a node by searching for the nodeAddress of the node in the > set of outstanding requests for the app. This seems to be incorrect as it > should be checking hostname instead. The offending line of code is 455: > application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); > Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses > are a concatenation of hostname and command port (e.g. host1.foo.com:1234) > In the CapacityScheduler, it's done using hostname. See > LeafQueue.assignNodeLocalContainers, line 1129 > application.getResourceRequest(priority, node.getHostName()); > Note that this bug does not affect the actual scheduling decisions made by > the FifoScheduler because even though it incorrect determines that a request > is not local to the node, it will still schedule the request immediately > because it's rack-local. However, this bug may be adversely affecting the > reporting of job status by underreporting the number of tasks that were node > local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: (was: YARN-412.patch) > FifoScheduler incorrectly checking for node locality > > > Key: YARN-412 > URL: https://issues.apache.org/jira/browse/YARN-412 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Roger Hoover >Assignee: Roger Hoover >Priority: Minor > Labels: patch > Attachments: YARN-412.patch > > > In the FifoScheduler, the assignNodeLocalContainers method is checking if the > data is local to a node by searching for the nodeAddress of the node in the > set of outstanding requests for the app. This seems to be incorrect as it > should be checking hostname instead. The offending line of code is 455: > application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); > Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses > are a concatenation of hostname and command port (e.g. host1.foo.com:1234) > In the CapacityScheduler, it's done using hostname. See > LeafQueue.assignNodeLocalContainers, line 1129 > application.getResourceRequest(priority, node.getHostName()); > Note that this bug does not affect the actual scheduling decisions made by > the FifoScheduler because even though it incorrect determines that a request > is not local to the node, it will still schedule the request immediately > because it's rack-local. However, this bug may be adversely affecting the > reporting of job status by underreporting the number of tasks that were node > local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: YARN-412.patch > FifoScheduler incorrectly checking for node locality > > > Key: YARN-412 > URL: https://issues.apache.org/jira/browse/YARN-412 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Roger Hoover >Assignee: Roger Hoover >Priority: Minor > Labels: patch > Attachments: YARN-412.patch > > > In the FifoScheduler, the assignNodeLocalContainers method is checking if the > data is local to a node by searching for the nodeAddress of the node in the > set of outstanding requests for the app. This seems to be incorrect as it > should be checking hostname instead. The offending line of code is 455: > application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); > Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses > are a concatenation of hostname and command port (e.g. host1.foo.com:1234) > In the CapacityScheduler, it's done using hostname. See > LeafQueue.assignNodeLocalContainers, line 1129 > application.getResourceRequest(priority, node.getHostName()); > Note that this bug does not affect the actual scheduling decisions made by > the FifoScheduler because even though it incorrect determines that a request > is not local to the node, it will still schedule the request immediately > because it's rack-local. However, this bug may be adversely affecting the > reporting of job status by underreporting the number of tasks that were node > local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs
[ https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621573#comment-13621573 ] Chris Nauroth commented on YARN-535: {{TestDistributedShell#setup}} has nearly identical code to overwrite yarn-site.xml. > TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during > write phase, breaks later test runs > > > Key: YARN-535 > URL: https://issues.apache.org/jira/browse/YARN-535 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 3.0.0 > Environment: OS/X laptop, HFS+ filesystem >Reporter: Steve Loughran >Priority: Minor > > the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. > As {{Configuration.writeXml()}} does a reread of all resources, this will > break if the (open-for-writing) resource is already visible as an empty file. > This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks > later test runs -because it is not overwritten by later incremental builds, > due to timestamps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-540) RM state store not cleaned if job succeeds but RM shutdown and restart-dispatcher stopped before it can process REMOVE_APP event
Jian He created YARN-540: Summary: RM state store not cleaned if job succeeds but RM shutdown and restart-dispatcher stopped before it can process REMOVE_APP event Key: YARN-540 URL: https://issues.apache.org/jira/browse/YARN-540 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-467: --- Attachment: yarn-467-testCode.tar This will help in testing distributed cache patch. > Jobs fail during resource localization when public distributed-cache hits > unix directory limits > --- > > Key: YARN-467 > URL: https://issues.apache.org/jira/browse/YARN-467 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.0.5-beta > > Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, > yarn-467-20130322.3.patch, yarn-467-20130322.patch, > yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, > yarn-467-20130401.patch, yarn-467-20130402.1.patch, > yarn-467-20130402.2.patch, yarn-467-20130402.patch, yarn-467-testCode.tar > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache (PUBLIC). The jobs start failing with > the below exception. > java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 > failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > we need to have a mechanism where in we can create directory hierarchy and > limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621390#comment-13621390 ] Hadoop QA commented on YARN-536: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576856/YARN-536.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/667//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/667//console This message is automatically generated. > Remove ContainerStatus, ContainerState from Container api interface as they > will not be called by the container object > -- > > Key: YARN-536 > URL: https://issues.apache.org/jira/browse/YARN-536 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-536.1.patch, YARN-536.2.patch > > > Remove containerstate, containerStatus from container interface. They will > not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621387#comment-13621387 ] Hadoop QA commented on YARN-193: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576857/YARN-193.14.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/668//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/668//console This message is automatically generated. > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, > YARN-193.13.patch, YARN-193.14.patch, YARN-193.4.patch, YARN-193.5.patch, > YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621385#comment-13621385 ] Vinod Kumar Vavilapalli commented on YARN-458: -- +1 for the patch after the fact. Thanks for doing this Sandy. > YARN daemon addresses must be placed in many different configs > -- > > Key: YARN-458 > URL: https://issues.apache.org/jira/browse/YARN-458 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.0.5-beta > > Attachments: YARN-458.patch > > > The YARN resourcemanager's address is included in four different configs: > yarn.resourcemanager.scheduler.address, > yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, > and yarn.resourcemanager.admin.address > A new user trying to configure a cluster needs to know the names of all these > four configs. > The same issue exists for nodemanagers. > It would be much easier if they could simply specify > yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports > for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails
[ https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-539: - Summary: LocalizedResources are leaked in memory in case resource localization fails (was: Memory leak in case resource localization fails. LocalizedResource remains in memory.) > LocalizedResources are leaked in memory in case resource localization fails > --- > > Key: YARN-539 > URL: https://issues.apache.org/jira/browse/YARN-539 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > If resource localization fails then resource remains in memory and is > 1) Either cleaned up when next time cache cleanup runs and there is space > crunch. (If sufficient space in cache is available then it will remain in > memory). > 2) reused if LocalizationRequest comes again for the same resource. > I think when resource localization fails then that event should be sent to > LocalResourceTracker which will then remove it from its cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-537) Waiting containers are not informed if private localization for a resource fails.
[ https://issues.apache.org/jira/browse/YARN-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621381#comment-13621381 ] Vinod Kumar Vavilapalli commented on YARN-537: -- Yup, I put in a comment long (long) time back asking why it isn't getting informed through the LocalizedResource which knows about all the waiting containers. I think we should do that. > Waiting containers are not informed if private localization for a resource > fails. > - > > Key: YARN-537 > URL: https://issues.apache.org/jira/browse/YARN-537 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > In ResourceLocalizationService.LocalizerRunner.update() if localization fails > then all the other waiting containers are not informed only the initiator is > informed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-193: - Attachment: YARN-193.14.patch Fixed the buggy test TestResourceManager#testResourceManagerInitConfigValidation > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, > YARN-193.13.patch, YARN-193.14.patch, YARN-193.4.patch, YARN-193.5.patch, > YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-536: --- Attachment: YARN-536.2.patch Fix the bug.. > Remove ContainerStatus, ContainerState from Container api interface as they > will not be called by the container object > -- > > Key: YARN-536 > URL: https://issues.apache.org/jira/browse/YARN-536 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-536.1.patch, YARN-536.2.patch > > > Remove containerstate, containerStatus from container interface. They will > not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621343#comment-13621343 ] Hadoop QA commented on YARN-536: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576843/YARN-536.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/664//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/664//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/664//console This message is automatically generated. > Remove ContainerStatus, ContainerState from Container api interface as they > will not be called by the container object > -- > > Key: YARN-536 > URL: https://issues.apache.org/jira/browse/YARN-536 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-536.1.patch > > > Remove containerstate, containerStatus from container interface. They will > not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621339#comment-13621339 ] Hadoop QA commented on YARN-99: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576840/yarn-99-20130403.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/666//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/666//console This message is automatically generated. > Jobs fail during resource localization when private distributed-cache hits > unix directory limits > > > Key: YARN-99 > URL: https://issues.apache.org/jira/browse/YARN-99 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Devaraj K >Assignee: Omkar Vinit Joshi > Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, > yarn-99-20130403.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache. The jobs start failing with the > below exception. > {code:xml} > java.io.IOException: mkdir of > /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} > We should have a mechanism to clean the cache files if it crosses specified > number of directories like cache size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621338#comment-13621338 ] Hadoop QA commented on YARN-412: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576838/YARN-412.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/665//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/665//console This message is automatically generated. > FifoScheduler incorrectly checking for node locality > > > Key: YARN-412 > URL: https://issues.apache.org/jira/browse/YARN-412 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Roger Hoover >Assignee: Roger Hoover >Priority: Minor > Labels: patch > Attachments: YARN-412.patch > > > In the FifoScheduler, the assignNodeLocalContainers method is checking if the > data is local to a node by searching for the nodeAddress of the node in the > set of outstanding requests for the app. This seems to be incorrect as it > should be checking hostname instead. The offending line of code is 455: > application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); > Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses > are a concatenation of hostname and command port (e.g. host1.foo.com:1234) > In the CapacityScheduler, it's done using hostname. See > LeafQueue.assignNodeLocalContainers, line 1129 > application.getResourceRequest(priority, node.getHostName()); > Note that this bug does not affect the actual scheduling decisions made by > the FifoScheduler because even though it incorrect determines that a request > is not local to the node, it will still schedule the request immediately > because it's rack-local. However, this bug may be adversely affecting the > reporting of job status by underreporting the number of tasks that were node > local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621333#comment-13621333 ] Hudson commented on YARN-458: - Integrated in Hadoop-trunk-Commit #3556 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3556/]) YARN-458. YARN daemon addresses must be placed in many different configs. (sandyr via tucu) (Revision 1464204) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464204 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > YARN daemon addresses must be placed in many different configs > -- > > Key: YARN-458 > URL: https://issues.apache.org/jira/browse/YARN-458 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.0.5-beta > > Attachments: YARN-458.patch > > > The YARN resourcemanager's address is included in four different configs: > yarn.resourcemanager.scheduler.address, > yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, > and yarn.resourcemanager.admin.address > A new user trying to configure a cluster needs to know the names of all these > four configs. > The same issue exists for nodemanagers. > It would be much easier if they could simply specify > yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports > for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-538) RM address DNS lookup can cause unnecessary slowness on every JHS page load
[ https://issues.apache.org/jira/browse/YARN-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621323#comment-13621323 ] Hudson commented on YARN-538: - Integrated in Hadoop-trunk-Commit #3555 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3555/]) YARN-538. RM address DNS lookup can cause unnecessary slowness on every JHS page load. (sandyr via tucu) (Revision 1464197) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464197 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > RM address DNS lookup can cause unnecessary slowness on every JHS page load > > > Key: YARN-538 > URL: https://issues.apache.org/jira/browse/YARN-538 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.0.5-beta > > Attachments: MAPREDUCE-5111.patch > > > When I run the job history server locally, every page load takes in the 10s > of seconds. I profiled the process and discovered that all the extra time > was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 > to a hostname. When I changed my yarn.resourcemanager.address to localhost, > the page load times decreased drastically. > There's no that we need to perform this resolution on every page load. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
[ https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621324#comment-13621324 ] Hudson commented on YARN-516: - Integrated in Hadoop-trunk-Commit #3555 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3555/]) Revert YARN-516 per HADOOP-9357. (Revision 1464181) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464181 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java > TestContainerLocalizer.testContainerLocalizerMain is failing > > > Key: YARN-516 > URL: https://issues.apache.org/jira/browse/YARN-516 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Andrew Wang > Fix For: 2.0.5-beta > > Attachments: YARN-516.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621319#comment-13621319 ] Alejandro Abdelnur commented on YARN-458: - +1. Do we need to do this for HS as well? If so please open a new JIRA. > YARN daemon addresses must be placed in many different configs > -- > > Key: YARN-458 > URL: https://issues.apache.org/jira/browse/YARN-458 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-458.patch > > > The YARN resourcemanager's address is included in four different configs: > yarn.resourcemanager.scheduler.address, > yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, > and yarn.resourcemanager.admin.address > A new user trying to configure a cluster needs to know the names of all these > four configs. > The same issue exists for nodemanagers. > It would be much easier if they could simply specify > yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports > for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-539) Memory leak in case resource localization fails. LocalizedResource remains in memory.
Omkar Vinit Joshi created YARN-539: -- Summary: Memory leak in case resource localization fails. LocalizedResource remains in memory. Key: YARN-539 URL: https://issues.apache.org/jira/browse/YARN-539 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi If resource localization fails then resource remains in memory and is 1) Either cleaned up when next time cache cleanup runs and there is space crunch. (If sufficient space in cache is available then it will remain in memory). 2) reused if LocalizationRequest comes again for the same resource. I think when resource localization fails then that event should be sent to LocalResourceTracker which will then remove it from its cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-538) RM address DNS lookup can cause unnecessary slowness on every JHS page load
[ https://issues.apache.org/jira/browse/YARN-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur moved MAPREDUCE-5111 to YARN-538: Component/s: (was: jobhistoryserver) Affects Version/s: (was: 2.0.3-alpha) 2.0.3-alpha Key: YARN-538 (was: MAPREDUCE-5111) Project: Hadoop YARN (was: Hadoop Map/Reduce) > RM address DNS lookup can cause unnecessary slowness on every JHS page load > > > Key: YARN-538 > URL: https://issues.apache.org/jira/browse/YARN-538 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5111.patch > > > When I run the job history server locally, every page load takes in the 10s > of seconds. I profiled the process and discovered that all the extra time > was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 > to a hostname. When I changed my yarn.resourcemanager.address to localhost, > the page load times decreased drastically. > There's no that we need to perform this resolution on every page load. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-537) Waiting containers are not informed if private localization for a resource fails.
Omkar Vinit Joshi created YARN-537: -- Summary: Waiting containers are not informed if private localization for a resource fails. Key: YARN-537 URL: https://issues.apache.org/jira/browse/YARN-537 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi In ResourceLocalizationService.LocalizerRunner.update() if localization fails then all the other waiting containers are not informed only the initiator is informed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-536: --- Attachment: YARN-536.1.patch > Remove ContainerStatus, ContainerState from Container api interface as they > will not be called by the container object > -- > > Key: YARN-536 > URL: https://issues.apache.org/jira/browse/YARN-536 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-536.1.patch > > > Remove containerstate, containerStatus from container interface. They will > not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-99: -- Attachment: yarn-99-20130403.1.patch > Jobs fail during resource localization when private distributed-cache hits > unix directory limits > > > Key: YARN-99 > URL: https://issues.apache.org/jira/browse/YARN-99 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Devaraj K >Assignee: Omkar Vinit Joshi > Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, > yarn-99-20130403.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache. The jobs start failing with the > below exception. > {code:xml} > java.io.IOException: mkdir of > /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} > We should have a mechanism to clean the cache files if it crosses specified > number of directories like cache size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: YARN-412.patch > FifoScheduler incorrectly checking for node locality > > > Key: YARN-412 > URL: https://issues.apache.org/jira/browse/YARN-412 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Roger Hoover >Assignee: Roger Hoover >Priority: Minor > Labels: patch > Attachments: YARN-412.patch > > > In the FifoScheduler, the assignNodeLocalContainers method is checking if the > data is local to a node by searching for the nodeAddress of the node in the > set of outstanding requests for the app. This seems to be incorrect as it > should be checking hostname instead. The offending line of code is 455: > application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); > Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses > are a concatenation of hostname and command port (e.g. host1.foo.com:1234) > In the CapacityScheduler, it's done using hostname. See > LeafQueue.assignNodeLocalContainers, line 1129 > application.getResourceRequest(priority, node.getHostName()); > Note that this bug does not affect the actual scheduling decisions made by > the FifoScheduler because even though it incorrect determines that a request > is not local to the node, it will still schedule the request immediately > because it's rack-local. However, this bug may be adversely affecting the > reporting of job status by underreporting the number of tasks that were node > local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: (was: YARN-412.patch) > FifoScheduler incorrectly checking for node locality > > > Key: YARN-412 > URL: https://issues.apache.org/jira/browse/YARN-412 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Roger Hoover >Assignee: Roger Hoover >Priority: Minor > Labels: patch > Attachments: YARN-412.patch > > > In the FifoScheduler, the assignNodeLocalContainers method is checking if the > data is local to a node by searching for the nodeAddress of the node in the > set of outstanding requests for the app. This seems to be incorrect as it > should be checking hostname instead. The offending line of code is 455: > application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); > Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses > are a concatenation of hostname and command port (e.g. host1.foo.com:1234) > In the CapacityScheduler, it's done using hostname. See > LeafQueue.assignNodeLocalContainers, line 1129 > application.getResourceRequest(priority, node.getHostName()); > Note that this bug does not affect the actual scheduling decisions made by > the FifoScheduler because even though it incorrect determines that a request > is not local to the node, it will still schedule the request immediately > because it's rack-local. However, this bug may be adversely affecting the > reporting of job status by underreporting the number of tasks that were node > local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
[ https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621255#comment-13621255 ] Eli Collins commented on YARN-516: -- I reverted this change (and the initial HADOOP-9357 patch). We'll put this fix back in the HADOOP-9357 patch if we do another rev. > TestContainerLocalizer.testContainerLocalizerMain is failing > > > Key: YARN-516 > URL: https://issues.apache.org/jira/browse/YARN-516 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Andrew Wang > Fix For: 2.0.5-beta > > Attachments: YARN-516.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621243#comment-13621243 ] Hadoop QA commented on YARN-99: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576823/yarn-99-20130403.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/663//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/663//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/663//console This message is automatically generated. > Jobs fail during resource localization when private distributed-cache hits > unix directory limits > > > Key: YARN-99 > URL: https://issues.apache.org/jira/browse/YARN-99 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Devaraj K >Assignee: Omkar Vinit Joshi > Attachments: yarn-99-20130324.patch, yarn-99-20130403.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache. The jobs start failing with the > below exception. > {code:xml} > java.io.IOException: mkdir of > /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} > We should have a mechanism to clean the cache files if it crosses specified > number of directories like cache size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621238#comment-13621238 ] Hadoop QA commented on YARN-193: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576820/YARN-193.13.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/662//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/662//console This message is automatically generated. > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, > YARN-193.13.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, > YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621218#comment-13621218 ] Hadoop QA commented on YARN-425: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576764/YARN-425-trunk-b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/661//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/661//console This message is automatically generated. > coverage fix for yarn api > - > > Key: YARN-425 > URL: https://issues.apache.org/jira/browse/YARN-425 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, > YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, > YARN-425-trunk.patch > > > coverage fix for yarn api > patch YARN-425-trunk-a.patch for trunk > patch YARN-425-branch-2.patch for branch-2 > patch YARN-425-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621215#comment-13621215 ] Hadoop QA commented on YARN-465: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576782/YARN-465-trunk-a.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/659//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/659//console This message is automatically generated. > fix coverage org.apache.hadoop.yarn.server.webproxy > > > Key: YARN-465 > URL: https://issues.apache.org/jira/browse/YARN-465 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-465-branch-0.23-a.patch, > YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, > YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch > > > fix coverage org.apache.hadoop.yarn.server.webproxy > patch YARN-465-trunk.patch for trunk > patch YARN-465-branch-2.patch for branch-2 > patch YARN-465-branch-0.23.patch for branch-0.23 > There is issue in branch-0.23 . Patch does not creating .keep file. > For fix it need to run commands: > mkdir > yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy > touch > yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-381) Improve FS docs
[ https://issues.apache.org/jira/browse/YARN-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621203#comment-13621203 ] Hudson commented on YARN-381: - Integrated in Hadoop-trunk-Commit #3554 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3554/]) YARN-381. Improve fair scheduler docs. Contributed by Sandy Ryza. (Revision 1464130) Result = SUCCESS tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464130 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm > Improve FS docs > --- > > Key: YARN-381 > URL: https://issues.apache.org/jira/browse/YARN-381 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Sandy Ryza >Priority: Minor > Fix For: 2.0.5-beta > > Attachments: YARN-381.patch > > > The MR2 FS docs could use some improvements. > Configuration: > - sizebasedweight - what is the "size" here? Total memory usage? > Pool properties: > - minResources - what does min amount of aggregate memory mean given that > this is not a reservation? > - maxResources - is this a hard limit? > - weight: How is this ratio configured? Eg base is 1 and all weights are > relative to that? > - schedulingMode - what is the default? Is fifo pure fifo, eg waits until all > tasks for the job are finished before launching the next job? > There's no mention of ACLs, even though they're supported. See the CS docs > for comparison. > Also there are a couple typos worth fixing while we're at it, eg "finish. > apps to run" > Worth keeping in mind that some of these will need to be updated to reflect > that resource calculators are now pluggable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621204#comment-13621204 ] Hadoop QA commented on YARN-427: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576767/YARN-427-trunk-a.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/660//console This message is automatically generated. > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, > YARN-427-trunk-a.patch, YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621201#comment-13621201 ] Hudson commented on YARN-101: - Integrated in Hadoop-trunk-Commit #3554 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3554/]) YARN-101. Fix NodeManager heartbeat processing to not lose track of completed containers in case of dropped heartbeats. Contributed by Xuan Gong. (Revision 1464105) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464105 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java > If the heartbeat message loss, the nodestatus info of complete container > will loss too. > > > Key: YARN-101 > URL: https://issues.apache.org/jira/browse/YARN-101 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: suse. >Reporter: xieguiming >Assignee: Xuan Gong >Priority: Minor > Fix For: 2.0.5-beta > > Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, > YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch > > > see the red color: > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java > protected void startStatusUpdater() { > new Thread("Node Status Updater") { > @Override > @SuppressWarnings("unchecked") > public void run() { > int lastHeartBeatID = 0; > while (!isStopped) { > // Send heartbeat > try { > synchronized (heartbeatMonitor) { > heartbeatMonitor.wait(heartBeatInterval); > } > {color:red} > // Before we send the heartbeat, we get the NodeStatus, > // whose method removes completed containers. > NodeStatus nodeStatus = getNodeStatus(); > {color} > nodeStatus.setResponseId(lastHeartBeatID); > > NodeHeartbeatRequest request = recordFactory > .newRecordInstance(NodeHeartbeatRequest.class); > request.setNodeStatus(nodeStatus); > {color:red} >// But if the nodeHeartbeat fails, we've already removed the > containers away to know about it. We aren't handling a nodeHeartbeat failure > case here. > HeartbeatResponse response = > resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); >{color} > if (response.getNodeAction() == NodeAction.SHUTDOWN) { > LOG > .info("Recieved SHUTDOWN signal from Resourcemanager as > part of heartbeat," + > " hence shutting down."); > NodeStatusUpdaterImpl.this.stop(); > break; > } > if (response.getNodeAction() == NodeAction.REBOOT) { > LOG.info("Node is out of sync with ResourceManager," > + " hence rebooting."); > NodeStatusUpdaterImpl.this.reboot(); > break; > } > lastHeartBeatID = response.getResponseId(); > List containersToCleanup = response > .getContainersToCleanupList(); > if (containersToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedContainersEvent(containersToCleanup)); > } > List appsToCleanup = > response.getApplicationsToCleanupList(); > //Only start tracking for keepAlive on FINISH_APP > trackAppsForKeepAlive(appsToCleanup); > if (appsToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedAppsEvent(appsToCleanup)); > } > } catch (Throwable e) { > // TODO Better error handling. Thread can die with the rest of the > // NM still running. > LOG.error("Caught exception in status-updater", e); > } > } > } > }.start(); > } > private NodeStatus getNodeStatus() { > NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.cla
[jira] [Commented] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621180#comment-13621180 ] Xuan Gong commented on YARN-536: Remove getter and setter for ContainerState, ContainerStatus from container interface, remove those contents from proto file. There are some test code which used the getter and setter to get containerState or containerStatus from container object. /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeManager.java /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java > Remove ContainerStatus, ContainerState from Container api interface as they > will not be called by the container object > -- > > Key: YARN-536 > URL: https://issues.apache.org/jira/browse/YARN-536 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > > Remove containerstate, containerStatus from container interface. They will > not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-404) Node Manager leaks Data Node connections
[ https://issues.apache.org/jira/browse/YARN-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-404: - Priority: Major (was: Blocker) Moving it off blocker status. Devaraj, can you give us more information. Is this still happening? Tx. > Node Manager leaks Data Node connections > > > Key: YARN-404 > URL: https://issues.apache.org/jira/browse/YARN-404 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.2-alpha, 0.23.6 >Reporter: Devaraj K >Assignee: Devaraj K > > RM is missing to give some applications to NM for clean up, due to this log > aggregation is not happening for those applications and also it is leaking > data node connections in NM side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
Xuan Gong created YARN-536: -- Summary: Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object Key: YARN-536 URL: https://issues.apache.org/jira/browse/YARN-536 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Remove containerstate, containerStatus from container interface. They will not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-536) Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object
[ https://issues.apache.org/jira/browse/YARN-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-536: -- Assignee: Xuan Gong > Remove ContainerStatus, ContainerState from Container api interface as they > will not be called by the container object > -- > > Key: YARN-536 > URL: https://issues.apache.org/jira/browse/YARN-536 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > > Remove containerstate, containerStatus from container interface. They will > not be called by container object -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-514) Delayed store operations should not result in RM unavailability for app submission
[ https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-514: Assignee: Zhijie Shen (was: Bikas Saha) > Delayed store operations should not result in RM unavailability for app > submission > -- > > Key: YARN-514 > URL: https://issues.apache.org/jira/browse/YARN-514 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Zhijie Shen > > Currently, app submission is the only store operation performed synchronously > because the app must be stored before the request returns with success. This > makes the RM susceptible to blocking all client threads on slow store > operations, resulting in RM being perceived as unavailable by clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-99: -- Attachment: yarn-99-20130403.patch > Jobs fail during resource localization when private distributed-cache hits > unix directory limits > > > Key: YARN-99 > URL: https://issues.apache.org/jira/browse/YARN-99 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Devaraj K >Assignee: Omkar Vinit Joshi > Attachments: yarn-99-20130324.patch, yarn-99-20130403.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache. The jobs start failing with the > below exception. > {code:xml} > java.io.IOException: mkdir of > /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} > We should have a mechanism to clean the cache files if it crosses specified > number of directories like cache size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621128#comment-13621128 ] Omkar Vinit Joshi commented on YARN-99: --- Rebasing the patch as 467 is now committed. This issue is related to 467 and the detailed information can be found here [underlying problem and proposed/implemented Solution | https://issues.apache.org/jira/browse/YARN-467?focusedCommentId=13615894&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13615894] The only difference here is that the same problem is present in /usercache//filecache (Private user cache). We are using LocalCacheDirectoryManager for user-cache but not for app-cache as it is highly unlikely for application to have so many localized files. Earlier implementation for private cache involved computing localized path inside ContainerLocalizer; i.e. in different processes. Now in order to centralize this we have moved it to ResourceLocalizationService.LocalizerRunner and this is communicated to all the ContainerLocalizer as a part of the heartbeat. Thereby we can now manage LocalCacheDirectory at one place. > Jobs fail during resource localization when private distributed-cache hits > unix directory limits > > > Key: YARN-99 > URL: https://issues.apache.org/jira/browse/YARN-99 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0, 2.0.0-alpha >Reporter: Devaraj K >Assignee: Omkar Vinit Joshi > Attachments: yarn-99-20130324.patch > > > If we have multiple jobs which uses distributed cache with small size of > files, the directory limit reaches before reaching the cache size and fails > to create any directories in file cache. The jobs start failing with the > below exception. > {code:xml} > java.io.IOException: mkdir of > /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} > We should have a mechanism to clean the cache files if it crosses specified > number of directories like cache size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-193: - Attachment: YARN-193.13.patch Fix the twice setting bug and change default max vcores to 4. > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, > YARN-193.13.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, > YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs
[ https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621114#comment-13621114 ] Sandy Ryza commented on YARN-458: - Verified on a pseudo-distributed cluster that both the old and new configs work. > YARN daemon addresses must be placed in many different configs > -- > > Key: YARN-458 > URL: https://issues.apache.org/jira/browse/YARN-458 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-458.patch > > > The YARN resourcemanager's address is included in four different configs: > yarn.resourcemanager.scheduler.address, > yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, > and yarn.resourcemanager.admin.address > A new user trying to configure a cluster needs to know the names of all these > four configs. > The same issue exists for nodemanagers. > It would be much easier if they could simply specify > yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports > for the other ones would kick in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-248) Security related work for RM restart
[ https://issues.apache.org/jira/browse/YARN-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-248: - Summary: Security related work for RM restart (was: Restore RMDelegationTokenSecretManager state on restart) > Security related work for RM restart > > > Key: YARN-248 > URL: https://issues.apache.org/jira/browse/YARN-248 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tom White >Assignee: Bikas Saha > > On restart, the RM creates a new RMDelegationTokenSecretManager with fresh > state. This will cause problems for Oozie jobs running on secure clusters > since the delegation tokens stored in the job credentials (used by the Oozie > launcher job to submit a job to the RM) will not be recognized by the RM, and > recovery will fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-430) Add HDFS based store for RM which manages the store using directories
[ https://issues.apache.org/jira/browse/YARN-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-430: - Summary: Add HDFS based store for RM which manages the store using directories (was: Add HDFS based store for RM) > Add HDFS based store for RM which manages the store using directories > - > > Key: YARN-430 > URL: https://issues.apache.org/jira/browse/YARN-430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Jian He > > There is a generic FileSystem store but it does not take advantage of HDFS > features like directories, replication, DFSClient advanced settings for HA, > retries etc. Writing a store thats optimized for HDFS would be good. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621073#comment-13621073 ] Bikas Saha commented on YARN-193: - These values need to be on the conservative side so that they work on most installations. Given 24-32GB memory is becoming baseline nowadays 8GB default for max is ok IMO. Given 16 cores becoming baseline nowadays 4 cores sounds like a good default for max IMO. This is per container and its not easy to write code that actually maxes out 8 cores :P > Scheduler.normalizeRequest does not account for allocation requests that > exceed maximumAllocation limits > - > > Key: YARN-193 > URL: https://issues.apache.org/jira/browse/YARN-193 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 3.0.0 >Reporter: Hitesh Shah >Assignee: Zhijie Shen > Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, > MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, > YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, > YARN-193.8.patch, YARN-193.9.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621044#comment-13621044 ] Vinod Kumar Vavilapalli commented on YARN-101: -- Looks much better, +1, checking it in. > If the heartbeat message loss, the nodestatus info of complete container > will loss too. > > > Key: YARN-101 > URL: https://issues.apache.org/jira/browse/YARN-101 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: suse. >Reporter: xieguiming >Assignee: Xuan Gong >Priority: Minor > Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, > YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch > > > see the red color: > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java > protected void startStatusUpdater() { > new Thread("Node Status Updater") { > @Override > @SuppressWarnings("unchecked") > public void run() { > int lastHeartBeatID = 0; > while (!isStopped) { > // Send heartbeat > try { > synchronized (heartbeatMonitor) { > heartbeatMonitor.wait(heartBeatInterval); > } > {color:red} > // Before we send the heartbeat, we get the NodeStatus, > // whose method removes completed containers. > NodeStatus nodeStatus = getNodeStatus(); > {color} > nodeStatus.setResponseId(lastHeartBeatID); > > NodeHeartbeatRequest request = recordFactory > .newRecordInstance(NodeHeartbeatRequest.class); > request.setNodeStatus(nodeStatus); > {color:red} >// But if the nodeHeartbeat fails, we've already removed the > containers away to know about it. We aren't handling a nodeHeartbeat failure > case here. > HeartbeatResponse response = > resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); >{color} > if (response.getNodeAction() == NodeAction.SHUTDOWN) { > LOG > .info("Recieved SHUTDOWN signal from Resourcemanager as > part of heartbeat," + > " hence shutting down."); > NodeStatusUpdaterImpl.this.stop(); > break; > } > if (response.getNodeAction() == NodeAction.REBOOT) { > LOG.info("Node is out of sync with ResourceManager," > + " hence rebooting."); > NodeStatusUpdaterImpl.this.reboot(); > break; > } > lastHeartBeatID = response.getResponseId(); > List containersToCleanup = response > .getContainersToCleanupList(); > if (containersToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedContainersEvent(containersToCleanup)); > } > List appsToCleanup = > response.getApplicationsToCleanupList(); > //Only start tracking for keepAlive on FINISH_APP > trackAppsForKeepAlive(appsToCleanup); > if (appsToCleanup.size() != 0) { > dispatcher.getEventHandler().handle( > new CMgrCompletedAppsEvent(appsToCleanup)); > } > } catch (Throwable e) { > // TODO Better error handling. Thread can die with the rest of the > // NM still running. > LOG.error("Caught exception in status-updater", e); > } > } > } > }.start(); > } > private NodeStatus getNodeStatus() { > NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); > nodeStatus.setNodeId(this.nodeId); > int numActiveContainers = 0; > List containersStatuses = new > ArrayList(); > for (Iterator> i = > this.context.getContainers().entrySet().iterator(); i.hasNext();) { > Entry e = i.next(); > ContainerId containerId = e.getKey(); > Container container = e.getValue(); > // Clone the container to send it to the RM > org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = > container.cloneAndGetContainerStatus(); > containersStatuses.add(containerStatus); > ++numActiveContainers; > LOG.info("Sending out status for container: " + containerStatus); > {color:red} > // Here is the part that removes the completed containers. > if (containerStatus.getState() == ContainerState.COMPLETE) { > // Remove > i.remove(); > {color} > LOG.info("Removed completed container " + container
[jira] [Commented] (YARN-527) Local filecache mkdir fails
[ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621042#comment-13621042 ] Vinod Kumar Vavilapalli commented on YARN-527: -- If it is the 32K limit that caused it, the timing can't be more perfect. I just committed YARN-467 which addresses it for public cache, and YARN-99 is in progress which takes care of private cache. These two JIRAs enforce a limit in YARN itself, default is 8192. Looking back again at your stack trace, I agree that it is very likely you are hitting the 32K limit. Can I close this as a duplicate of YARN-467? You can verify the fix on 2.0.5-beta when it is out. > Local filecache mkdir fails > --- > > Key: YARN-527 > URL: https://issues.apache.org/jira/browse/YARN-527 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.0-alpha > Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes > and six worker nodes. >Reporter: Knut O. Hellan >Priority: Minor > Attachments: yarn-site.xml > > > Jobs failed with no other explanation than this stack trace: > 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag > nostics report from attempt_1364591875320_0017_m_00_0: > java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 > 55400878397 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Manually creating the directory worked. This behavior was common to at least > several nodes in the cluster. > The situation was resolved by removing and recreating all > /disk?/yarn/local/filecache directories on all nodes. > It is unclear whether Yarn struggled with the number of files or if there > were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: (was: YARN-412.patch) > FifoScheduler incorrectly checking for node locality > > > Key: YARN-412 > URL: https://issues.apache.org/jira/browse/YARN-412 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Roger Hoover >Assignee: Roger Hoover >Priority: Minor > Labels: patch > Attachments: YARN-412.patch > > > In the FifoScheduler, the assignNodeLocalContainers method is checking if the > data is local to a node by searching for the nodeAddress of the node in the > set of outstanding requests for the app. This seems to be incorrect as it > should be checking hostname instead. The offending line of code is 455: > application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); > Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses > are a concatenation of hostname and command port (e.g. host1.foo.com:1234) > In the CapacityScheduler, it's done using hostname. See > LeafQueue.assignNodeLocalContainers, line 1129 > application.getResourceRequest(priority, node.getHostName()); > Note that this bug does not affect the actual scheduling decisions made by > the FifoScheduler because even though it incorrect determines that a request > is not local to the node, it will still schedule the request immediately > because it's rack-local. However, this bug may be adversely affecting the > reporting of job status by underreporting the number of tasks that were node > local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: YARN-412.patch > FifoScheduler incorrectly checking for node locality > > > Key: YARN-412 > URL: https://issues.apache.org/jira/browse/YARN-412 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Roger Hoover >Assignee: Roger Hoover >Priority: Minor > Labels: patch > Attachments: YARN-412.patch > > > In the FifoScheduler, the assignNodeLocalContainers method is checking if the > data is local to a node by searching for the nodeAddress of the node in the > set of outstanding requests for the app. This seems to be incorrect as it > should be checking hostname instead. The offending line of code is 455: > application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); > Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses > are a concatenation of hostname and command port (e.g. host1.foo.com:1234) > In the CapacityScheduler, it's done using hostname. See > LeafQueue.assignNodeLocalContainers, line 1129 > application.getResourceRequest(priority, node.getHostName()); > Note that this bug does not affect the actual scheduling decisions made by > the FifoScheduler because even though it incorrect determines that a request > is not local to the node, it will still schedule the request immediately > because it's rack-local. However, this bug may be adversely affecting the > reporting of job status by underreporting the number of tasks that were node > local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621023#comment-13621023 ] Robert Joseph Evans commented on YARN-528: -- OK, I understand now. I will try to find some time to play around with getting the AM ID to not have a wrapper at all. > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: YARN-528.txt, YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs
[ https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620974#comment-13620974 ] Steve Loughran commented on YARN-535: - {code} org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher Time elapsed: 4137 sec <<< ERROR! java.lang.RuntimeException: Error parsing 'yarn-site.xml' : org.xml.sax.SAXParseException: Premature end of file. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2050) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1899) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1816) at org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:465) at org.apache.hadoop.conf.Configuration.asXmlDocument(Configuration.java:2127) at org.apache.hadoop.conf.Configuration.writeXml(Configuration.java:2096) at org.apache.hadoop.conf.Configuration.writeXml(Configuration.java:2086) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.setup(TestUnmanagedAMLauncher.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) Caused by: org.xml.sax.SAXParseException: Premature end of file. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:246) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:153) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:1887) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:1875) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1946) ... 29 more {code} This stack trace is a failure to read the file yarn-site.xml, which is actually being written on line 63 of TestUnmanagedAMLauncher -a file that is already open for writing. It is possible that some filesystems (here, HFS+) make that write visible while it is still going on, triggering a failure which then corrupts later builds at init time {code} $ ls -l target/test-classes/yarn-site.xml -rw-r--r-- 1 stevel staff 0 3 Apr 15:37 target/test-classes/yarn-site.xml {code} This is newer than the one in test/properties, so Maven doesn't fix it next test run {code} $ ls -l src/test/resources/yarn-site.xml -rw-r--r--@ 1 stevel staff 830 28 Nov 16:29 src/test/resources/yarn-site.xml {code} as a result, follow on tests fail when MiniYARNCluster tries to read it. {code} org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher Time elapsed: 515 sec <<< ERROR! java.lang.RuntimeException: Error parsing 'yarn-site.xml' : org.xml.sax.SAXParseException: Premature end of file. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2050) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1899) at
[jira] [Created] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs
Steve Loughran created YARN-535: --- Summary: TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs Key: YARN-535 URL: https://issues.apache.org/jira/browse/YARN-535 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 3.0.0 Environment: OS/X laptop, HFS+ filesystem Reporter: Steve Loughran Priority: Minor the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. As {{Configuration.writeXml()}} does a reread of all resources, this will break if the (open-for-writing) resource is already visible as an empty file. This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks later test runs -because it is not overwritten by later incremental builds, due to timestamps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-465: -- Attachment: YARN-465-trunk-a.patch > fix coverage org.apache.hadoop.yarn.server.webproxy > > > Key: YARN-465 > URL: https://issues.apache.org/jira/browse/YARN-465 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-465-branch-0.23-a.patch, > YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, > YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch > > > fix coverage org.apache.hadoop.yarn.server.webproxy > patch YARN-465-trunk.patch for trunk > patch YARN-465-branch-2.patch for branch-2 > patch YARN-465-branch-0.23.patch for branch-0.23 > There is issue in branch-0.23 . Patch does not creating .keep file. > For fix it need to run commands: > mkdir > yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy > touch > yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620914#comment-13620914 ] Aleksey Gorshkov commented on YARN-465: --- patches updated patch YARN-465-trunk-a.patch for trunk patch YARN-465-branch-2-a.patch for branch-2 patch YARN-465-branch-0.23-a.patch for branch-0.23 > fix coverage org.apache.hadoop.yarn.server.webproxy > > > Key: YARN-465 > URL: https://issues.apache.org/jira/browse/YARN-465 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-465-branch-0.23-a.patch, > YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, > YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch > > > fix coverage org.apache.hadoop.yarn.server.webproxy > patch YARN-465-trunk.patch for trunk > patch YARN-465-branch-2.patch for branch-2 > patch YARN-465-branch-0.23.patch for branch-0.23 > There is issue in branch-0.23 . Patch does not creating .keep file. > For fix it need to run commands: > mkdir > yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy > touch > yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620913#comment-13620913 ] Steve Loughran commented on YARN-117: - I'm not seeing all those tests failing locally, only {{TestUnmanagedAMLauncher}} and {{TestNMExpiry}}. {code} testNMExpiry(org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry) Time elapsed: 2797 sec <<< FAILURE! junit.framework.AssertionFailedError: expected:<2> but was:<0> at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:283) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:195) at junit.framework.Assert.assertEquals(Assert.java:201) at org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry.testNMExpiry(TestNMExpiry.java:157) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) {code} and {code} Running org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.579 sec <<< FAILURE! org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher Time elapsed: 579 sec <<< ERROR! org.apache.hadoop.yarn.YarnException: could not cleanup test dir: java.lang.RuntimeException: Error parsing 'yarn-site.xml' : org.xml.sax.SAXParseException: Premature end of file. at org.apache.hadoop.yarn.server.MiniYARNCluster.(MiniYARNCluster.java:95) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.setup(TestUnmanagedAMLauncher.java:52) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > Enhance YARN service model > -- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop sa
[jira] [Updated] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-465: -- Attachment: YARN-465-branch-2-a.patch YARN-465-branch-0.23-a.patch > fix coverage org.apache.hadoop.yarn.server.webproxy > > > Key: YARN-465 > URL: https://issues.apache.org/jira/browse/YARN-465 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-465-branch-0.23-a.patch, > YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, > YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch > > > fix coverage org.apache.hadoop.yarn.server.webproxy > patch YARN-465-trunk.patch for trunk > patch YARN-465-branch-2.patch for branch-2 > patch YARN-465-branch-0.23.patch for branch-0.23 > There is issue in branch-0.23 . Patch does not creating .keep file. > For fix it need to run commands: > mkdir > yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy > touch > yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620845#comment-13620845 ] Aleksey Gorshkov commented on YARN-427: --- patch was update patch YARN-427-trunk-a.patch for trunk patch YARN-427-branch-2-a.patch for branch-2 and branch-0.23 > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, > YARN-427-trunk-a.patch, YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*
[ https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-427: -- Attachment: YARN-427-trunk-a.patch YARN-427-branch-2-a.patch > Coverage fix for org.apache.hadoop.yarn.server.api.* > > > Key: YARN-427 > URL: https://issues.apache.org/jira/browse/YARN-427 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-427-branch-2-a.patch, YARN-427-branch-2.patch, > YARN-427-trunk-a.patch, YARN-427-trunk.patch > > > Coverage fix for org.apache.hadoop.yarn.server.api.* > patch YARN-427-trunk.patch for trunk > patch YARN-427-branch-2.patch for branch-2 and branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620843#comment-13620843 ] Aleksey Gorshkov commented on YARN-425: --- update patch for trunk :YARN-425-trunk-b.patch and for branch-2 YARN-425-branch-2-b.patch > coverage fix for yarn api > - > > Key: YARN-425 > URL: https://issues.apache.org/jira/browse/YARN-425 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, > YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, > YARN-425-trunk.patch > > > coverage fix for yarn api > patch YARN-425-trunk-a.patch for trunk > patch YARN-425-branch-2.patch for branch-2 > patch YARN-425-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-425: -- Attachment: YARN-425-trunk-b.patch > coverage fix for yarn api > - > > Key: YARN-425 > URL: https://issues.apache.org/jira/browse/YARN-425 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, > YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, > YARN-425-trunk.patch > > > coverage fix for yarn api > patch YARN-425-trunk-a.patch for trunk > patch YARN-425-branch-2.patch for branch-2 > patch YARN-425-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated YARN-425: -- Attachment: YARN-425-branch-2-b.patch > coverage fix for yarn api > - > > Key: YARN-425 > URL: https://issues.apache.org/jira/browse/YARN-425 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta >Reporter: Aleksey Gorshkov >Assignee: Aleksey Gorshkov > Attachments: YARN-425-branch-0.23.patch, YARN-425-branch-2-b.patch, > YARN-425-branch-2.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, > YARN-425-trunk.patch > > > coverage fix for yarn api > patch YARN-425-trunk-a.patch for trunk > patch YARN-425-branch-2.patch for branch-2 > patch YARN-425-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-527) Local filecache mkdir fails
[ https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620711#comment-13620711 ] Knut O. Hellan commented on YARN-527: - There is really no difference in how the directories are created. What probably happened under the hood was that the file system reached maximum number of files in the filecache directory. This maximum size is 32000 since we use EXT3. I don't have the exact numbers for any of the disks from my checks, but i remember seeing above 30k some places. The reason we were able to manually create directories might be that there was some automatic cleanup happening. Does YARN clean the file cache? > Local filecache mkdir fails > --- > > Key: YARN-527 > URL: https://issues.apache.org/jira/browse/YARN-527 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.0-alpha > Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes > and six worker nodes. >Reporter: Knut O. Hellan >Priority: Minor > Attachments: yarn-site.xml > > > Jobs failed with no other explanation than this stack trace: > 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag > nostics report from attempt_1364591875320_0017_m_00_0: > java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893 > 55400878397 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Manually creating the directory worked. This behavior was common to at least > several nodes in the cluster. > The situation was resolved by removing and recreating all > /disk?/yarn/local/filecache directories on all nodes. > It is unclear whether Yarn struggled with the number of files or if there > were corrupt files in the caches. The situation was triggered by a node dying. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality
[ https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620705#comment-13620705 ] Sandy Ryza commented on YARN-392: - Ok, I will work on a patch for the non-blacklist proposal. To clarify, should location-specific requests be able to coexist with non-location-specific requests at the same priority? > Make it possible to schedule to specific nodes without dropping locality > > > Key: YARN-392 > URL: https://issues.apache.org/jira/browse/YARN-392 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Sandy Ryza > Attachments: YARN-392-1.patch, YARN-392.patch > > > Currently its not possible to specify scheduling requests for specific nodes > and nowhere else. The RM automatically relaxes locality to rack and * and > assigns non-specified machines to the app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira