[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081988#comment-14081988 ] Hadoop QA commented on YARN-2008: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659068/YARN-2008.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4503//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4503//console This message is automatically generated. > CapacityScheduler may report incorrect queueMaxCap if there is hierarchy > queue structure > - > > Key: YARN-2008 > URL: https://issues.apache.org/jira/browse/YARN-2008 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.3.0 >Reporter: Chen He >Assignee: Craig Welch > Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, > YARN-2008.4.patch, YARN-2008.5.patch > > > If there are two queues, both allowed to use 100% of the actual resources in > the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and > there is not actual space available. If we use current method to get > headroom, CapacityScheduler thinks there are still available resources for > users in Q1 but they have been used by Q2. > If the CapacityScheduelr has a hierarchy queue structure, it may report > incorrect queueMaxCap. Here is a example > ||||rootQueue|| || > | | / | > \ | > | L1ParentQueue1 | | > L1ParentQueue2| > | (allowed to use up 80% of its parent)| | (allowed to use 20% > in minimum of its parent)| > |/ | \ || > | L2LeafQueue1 |L2LeafQueue2 | | > |(50% of its parent) | (50% of its parent in minimum) | | > When we calculate headroom of a user in L2LeafQueue2, current method will > think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. > However, without checking L1ParentQueue1, we are not sure. It is possible > that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, > L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081985#comment-14081985 ] Hadoop QA commented on YARN-2069: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659051/YARN-2069-trunk-9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4502//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4502//console This message is automatically generated. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch, > YARN-2069-trunk-9.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: (was: YARN-2348.2.patch) > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: (was: YARN-2348.patch) > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2051) Fix bug in PBimpls and add more unit tests with reflection
[ https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081899#comment-14081899 ] Hudson commented on YARN-2051: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5993 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5993/]) YARN-2051. Fix bug in PBimpls and add more unit tests with reflection. (Contributed by Binglin Chang) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1615025) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationsRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceOption.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetApplicationsRequestPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationReportPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceBlacklistRequestPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceOptionPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/TokenPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UpdateNodeResourceRequestPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java > Fix bug in PBimpls and add more unit tests with reflection > -- > > Key: YARN-2051 > URL: https://issues.apache.org/jira/browse/YARN-2051 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Binglin Chang >Priority: Critical > Attachments: YARN-2051.v1.patch, YARN-2051.v2.patch > > > From YARN-2016, we can see some bug could exist in PB implementation of > protocol. The bad news is most of these PBImpl don't have any unit test to > verify the info is not lost or changed after serialization/deserialization. > We should add more tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2051) Fix bug in PBimpls and add more unit tests with reflection
[ https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2051: - Summary: Fix bug in PBimpls and add more unit tests with reflection (was: Fix code in PBimpls and add more unit tests with reflection) > Fix bug in PBimpls and add more unit tests with reflection > -- > > Key: YARN-2051 > URL: https://issues.apache.org/jira/browse/YARN-2051 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Binglin Chang >Priority: Critical > Attachments: YARN-2051.v1.patch, YARN-2051.v2.patch > > > From YARN-2016, we can see some bug could exist in PB implementation of > protocol. The bad news is most of these PBImpl don't have any unit test to > verify the info is not lost or changed after serialization/deserialization. > We should add more tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2051) Fix code in PBimpls and add more unit tests with reflection
[ https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2051: - Summary: Fix code in PBimpls and add more unit tests with reflection (was: Fix code bug and add more unit tests for PBImpls) > Fix code in PBimpls and add more unit tests with reflection > --- > > Key: YARN-2051 > URL: https://issues.apache.org/jira/browse/YARN-2051 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Binglin Chang >Priority: Critical > Attachments: YARN-2051.v1.patch, YARN-2051.v2.patch > > > From YARN-2016, we can see some bug could exist in PB implementation of > protocol. The bad news is most of these PBImpl don't have any unit test to > verify the info is not lost or changed after serialization/deserialization. > We should add more tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081845#comment-14081845 ] Wangda Tan commented on YARN-2069: -- Hi [~mayank_bansal], Thanks for uploading, reviewing it now. Wangda > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch, > YARN-2069-trunk-9.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081844#comment-14081844 ] Wangda Tan commented on YARN-2008: -- Hi [~cwelch], Thanks for updating, now tests can cover all cases I can think about, A very minor comment: Could you please add a small ε for all {{assertEquals}} like following? bq. +assertEquals( 0.1f, result, 0.01f); Thanks, Wangda > CapacityScheduler may report incorrect queueMaxCap if there is hierarchy > queue structure > - > > Key: YARN-2008 > URL: https://issues.apache.org/jira/browse/YARN-2008 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.3.0 >Reporter: Chen He >Assignee: Craig Welch > Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, > YARN-2008.4.patch, YARN-2008.5.patch > > > If there are two queues, both allowed to use 100% of the actual resources in > the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and > there is not actual space available. If we use current method to get > headroom, CapacityScheduler thinks there are still available resources for > users in Q1 but they have been used by Q2. > If the CapacityScheduelr has a hierarchy queue structure, it may report > incorrect queueMaxCap. Here is a example > ||||rootQueue|| || > | | / | > \ | > | L1ParentQueue1 | | > L1ParentQueue2| > | (allowed to use up 80% of its parent)| | (allowed to use 20% > in minimum of its parent)| > |/ | \ || > | L2LeafQueue1 |L2LeafQueue2 | | > |(50% of its parent) | (50% of its parent in minimum) | | > When we calculate headroom of a user in L2LeafQueue2, current method will > think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. > However, without checking L1ParentQueue1, we are not sure. It is possible > that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, > L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically
[ https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081841#comment-14081841 ] Xuan Gong commented on YARN-2212: - Test can be passed locally.. > ApplicationMaster needs to find a way to update the AMRMToken periodically > -- > > Key: YARN-2212 > URL: https://issues.apache.org/jira/browse/YARN-2212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2212.1.patch, YARN-2212.2.patch, > YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, > YARN-2212.5.patch, YARN-2212.5.rebase.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2288) Data persistent in timelinestore should be versioned
[ https://issues.apache.org/jira/browse/YARN-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2288: - Attachment: YARN-2288.patch Upload the patch to version timelinestore. > Data persistent in timelinestore should be versioned > > > Key: YARN-2288 > URL: https://issues.apache.org/jira/browse/YARN-2288 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.4.1 >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-2288.patch > > > We have LevelDB-backed TimelineStore, it should have schema version for > changes in schema in future. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081832#comment-14081832 ] Zhijie Shen commented on YARN-2372: --- There're non-unicode double quotes in HdfsDesign.apt.vm and HdfsNfsGateway.apt.vm. It's not a big change, and I think we can fix them in one patch. > There are Chinese Characters in the FairScheduler's document > > > Key: YARN-2372 > URL: https://issues.apache.org/jira/browse/YARN-2372 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.4.1 >Reporter: Fengdong Yu >Assignee: Fengdong Yu >Priority: Minor > Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, > YARN-2372.patch, YARN-2372.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically
[ https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081826#comment-14081826 ] Hadoop QA commented on YARN-2212: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658995/YARN-2212.5.rebase.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4501//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4501//console This message is automatically generated. > ApplicationMaster needs to find a way to update the AMRMToken periodically > -- > > Key: YARN-2212 > URL: https://issues.apache.org/jira/browse/YARN-2212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2212.1.patch, YARN-2212.2.patch, > YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, > YARN-2212.5.patch, YARN-2212.5.rebase.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2008: -- Attachment: YARN-2008.5.patch This time, actually with the additional tests :-) > CapacityScheduler may report incorrect queueMaxCap if there is hierarchy > queue structure > - > > Key: YARN-2008 > URL: https://issues.apache.org/jira/browse/YARN-2008 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.3.0 >Reporter: Chen He >Assignee: Craig Welch > Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, > YARN-2008.4.patch, YARN-2008.5.patch > > > If there are two queues, both allowed to use 100% of the actual resources in > the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and > there is not actual space available. If we use current method to get > headroom, CapacityScheduler thinks there are still available resources for > users in Q1 but they have been used by Q2. > If the CapacityScheduelr has a hierarchy queue structure, it may report > incorrect queueMaxCap. Here is a example > ||||rootQueue|| || > | | / | > \ | > | L1ParentQueue1 | | > L1ParentQueue2| > | (allowed to use up 80% of its parent)| | (allowed to use 20% > in minimum of its parent)| > |/ | \ || > | L2LeafQueue1 |L2LeafQueue2 | | > |(50% of its parent) | (50% of its parent in minimum) | | > When we calculate headroom of a user in L2LeafQueue2, current method will > think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. > However, without checking L1ParentQueue1, we are not sure. It is possible > that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, > L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2377) Localization exception stack traces are not passed as diagnostic info
[ https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-2377: Attachment: YARN-2377.v01.patch v01 for review. With this you get a more actionable stack trace: {code} 14/07/31 17:46:39 INFO mapreduce.Job: Job job_1406853387336_0001 failed with state FAILED due to: Application application_1406853387336_0001 failed 2 times due to AM Container for appattempt_1406853387336_0001_02 exited with exitCode: -1000 For more detailed output, check application tracking page:http://tw-mbp-gshegalov:8088/proxy/application_1406853387336_0001/Then, click on links to logs of each attempt. Diagnostics: java.net.UnknownHostException: ha-nn-uri-0 java.lang.IllegalArgumentException: java.net.UnknownHostException: ha-nn-uri-0 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:260) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:153) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:607) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:552) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:139) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2590) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2624) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2606) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:248) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:60) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:354) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:353) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: java.net.UnknownHostException: ha-nn-uri-0 ... 29 more Caused by: ha-nn-uri-0 java.lang.IllegalArgumentException: java.net.UnknownHostException: ha-nn-uri-0 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:260) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:153) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:607) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:552) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:139) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2590) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2624) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2606) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:248) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:60) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:354) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:353) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) at java.util.concurrent.FutureTask$Sync.innerRun
[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081777#comment-14081777 ] Fengdong Yu commented on YARN-2372: --- I cannot find any more places on this issue by now. Thanks. > There are Chinese Characters in the FairScheduler's document > > > Key: YARN-2372 > URL: https://issues.apache.org/jira/browse/YARN-2372 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.4.1 >Reporter: Fengdong Yu >Assignee: Fengdong Yu >Priority: Minor > Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, > YARN-2372.patch, YARN-2372.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2377) Localization exception stack traces are not passed as diagnostic info
[ https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-2377: Description: In the Localizer log one can only see this kind of message {code} 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar, 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: ha-nn-uri-0 {code} And then only {{ java.net.UnknownHostException: ha-nn-uri-0}} message is propagated as diagnostics. was: In the Localizer log one can only see this kind of message {code} 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar, 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: ha-nn-uri-0 {code} And then only {{ java.net.UnknownHos tException: ha-nn-uri-0}} message is propagated as diagnostics. > Localization exception stack traces are not passed as diagnostic info > - > > Key: YARN-2377 > URL: https://issues.apache.org/jira/browse/YARN-2377 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov > > In the Localizer log one can only see this kind of message > {code} > 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { > hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar, > 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos > tException: ha-nn-uri-0 > {code} > And then only {{ java.net.UnknownHostException: ha-nn-uri-0}} message is > propagated as diagnostics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2377) Localization exception stack traces are not passed as diagnostic info
[ https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-2377: Description: In the Localizer log one can only see this kind of message {code} 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar, 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: ha-nn-uri-0 {code} And then only {{ java.net.UnknownHos tException: ha-nn-uri-0}} message is propagated as diagnostics. was: In the Localizer log one can only see this kind of message {code} 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar, 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: ha-nn-uri-0 {code} And then onlt {{ java.net.UnknownHos tException: ha-nn-uri-0}} message is propagated as diagnostics. > Localization exception stack traces are not passed as diagnostic info > - > > Key: YARN-2377 > URL: https://issues.apache.org/jira/browse/YARN-2377 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov > > In the Localizer log one can only see this kind of message > {code} > 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { > hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar, > 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos > tException: ha-nn-uri-0 > {code} > And then only {{ java.net.UnknownHos tException: ha-nn-uri-0}} message is > propagated as diagnostics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2377) Localization exception stack traces are not passed as diagnostic info
Gera Shegalov created YARN-2377: --- Summary: Localization exception stack traces are not passed as diagnostic info Key: YARN-2377 URL: https://issues.apache.org/jira/browse/YARN-2377 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov In the Localizer log one can only see this kind of message {code} 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar, 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: ha-nn-uri-0 {code} And then onlt {{ java.net.UnknownHos tException: ha-nn-uri-0}} message is propagated as diagnostics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-2069: Attachment: YARN-2069-trunk-9.patch Fixing findbug warning and adding one more test case for no user limit. Thanks, Mayank > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch, > YARN-2069-trunk-9.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081718#comment-14081718 ] Wangda Tan commented on YARN-2008: -- Hi [~cwelch], I found the patch you updated is identical with *.3.patch, could you please check? Thanks > CapacityScheduler may report incorrect queueMaxCap if there is hierarchy > queue structure > - > > Key: YARN-2008 > URL: https://issues.apache.org/jira/browse/YARN-2008 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.3.0 >Reporter: Chen He >Assignee: Craig Welch > Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, > YARN-2008.4.patch > > > If there are two queues, both allowed to use 100% of the actual resources in > the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and > there is not actual space available. If we use current method to get > headroom, CapacityScheduler thinks there are still available resources for > users in Q1 but they have been used by Q2. > If the CapacityScheduelr has a hierarchy queue structure, it may report > incorrect queueMaxCap. Here is a example > ||||rootQueue|| || > | | / | > \ | > | L1ParentQueue1 | | > L1ParentQueue2| > | (allowed to use up 80% of its parent)| | (allowed to use 20% > in minimum of its parent)| > |/ | \ || > | L2LeafQueue1 |L2LeafQueue2 | | > |(50% of its parent) | (50% of its parent in minimum) | | > When we calculate headroom of a user in L2LeafQueue2, current method will > think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. > However, without checking L1ParentQueue1, we are not sure. It is possible > that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, > L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2376) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter i
[ https://issues.apache.org/jira/browse/YARN-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu resolved YARN-2376. - Resolution: Duplicate > Too many threads blocking on the global JobTracker lock from getJobCounters, > optimize getJobCounters to release global JobTracker lock before access the > per job counter in JobInProgress > - > > Key: YARN-2376 > URL: https://issues.apache.org/jira/browse/YARN-2376 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2376.000.patch > > > Too many threads blocking on the global JobTracker lock from getJobCounters, > optimize getJobCounters to release global JobTracker lock before access the > per job counter in JobInProgress. It may be a lot of JobClients to call > getJobCounters in JobTracker at the same time, Current code will lock the > JobTracker to block all the threads to get counter from JobInProgress. It is > better to unlock the JobTracker when get counter from > JobInProgress(job.getCounters(counters)). So all the theads can run parallel > when access its own job counter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2376) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in
[ https://issues.apache.org/jira/browse/YARN-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2376: Attachment: YARN-2376.000.patch > Too many threads blocking on the global JobTracker lock from getJobCounters, > optimize getJobCounters to release global JobTracker lock before access the > per job counter in JobInProgress > - > > Key: YARN-2376 > URL: https://issues.apache.org/jira/browse/YARN-2376 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2376.000.patch > > > Too many threads blocking on the global JobTracker lock from getJobCounters, > optimize getJobCounters to release global JobTracker lock before access the > per job counter in JobInProgress. It may be a lot of JobClients to call > getJobCounters in JobTracker at the same time, Current code will lock the > JobTracker to block all the threads to get counter from JobInProgress. It is > better to unlock the JobTracker when get counter from > JobInProgress(job.getCounters(counters)). So all the theads can run parallel > when access its own job counter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2376) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in
zhihai xu created YARN-2376: --- Summary: Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress Key: YARN-2376 URL: https://issues.apache.org/jira/browse/YARN-2376 Project: Hadoop YARN Issue Type: Improvement Reporter: zhihai xu Assignee: zhihai xu Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress. It may be a lot of JobClients to call getJobCounters in JobTracker at the same time, Current code will lock the JobTracker to block all the threads to get counter from JobInProgress. It is better to unlock the JobTracker when get counter from JobInProgress(job.getCounters(counters)). So all the theads can run parallel when access its own job counter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081535#comment-14081535 ] Hudson commented on YARN-1994: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5992 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5992/]) YARN-1994. Expose YARN/MR endpoints on multiple interfaces. Contributed by Craig Welch, Milan Potocnik,and Arpit Agarwal (xgong: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1614981) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/AppContext.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/client/MRClientService.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MockAppContext.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRuntimeEstimators.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JHAdminConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRWebAppUtil.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryClientService.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/server/HSAdminServer.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/C
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081508#comment-14081508 ] Xuan Gong commented on YARN-1994: - Committed to trunk and branch-2. Thanks, Craig, Arpit and Milan > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Fix For: 2.6.0 > > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, > YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, > YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15-branch2.patch, > YARN-1994.15.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, > YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081505#comment-14081505 ] Xuan Gong commented on YARN-1994: - +1 LGTM. Thanks Craig for providing the branch-2 patch > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, > YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, > YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15-branch2.patch, > YARN-1994.15.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, > YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1994: -- Attachment: YARN-1994.15-branch2.patch Adding a version of the patch for branch-2, because the one from trunk doesn't cleanly apply. Minor changes to deal with some other uncommited work from trunk in a unit test. This patch will fail when applied to trunk most likely, that can be ignored. > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, > YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, > YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15-branch2.patch, > YARN-1994.15.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, > YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2304) Test*WebServices* fails intermittently
[ https://issues.apache.org/jira/browse/YARN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081487#comment-14081487 ] Tsuyoshi OZAWA commented on YARN-2304: -- [~zjshen], thank you for notifying. After the work by [~jlowe], we don't see the test failure anymore. Closed as a fixed problem. > Test*WebServices* fails intermittently > -- > > Key: YARN-2304 > URL: https://issues.apache.org/jira/browse/YARN-2304 > Project: Hadoop YARN > Issue Type: Test >Reporter: Tsuyoshi OZAWA > Attachments: test-failure-log-RMWeb.txt > > > TestNMWebService, TestRMWebService, and TestAMWebService get failed with > address already get bind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2304) Test*WebServices* fails intermittently
[ https://issues.apache.org/jira/browse/YARN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA resolved YARN-2304. -- Resolution: Fixed > Test*WebServices* fails intermittently > -- > > Key: YARN-2304 > URL: https://issues.apache.org/jira/browse/YARN-2304 > Project: Hadoop YARN > Issue Type: Test >Reporter: Tsuyoshi OZAWA > Attachments: test-failure-log-RMWeb.txt > > > TestNMWebService, TestRMWebService, and TestAMWebService get failed with > address already get bind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically
[ https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2212: Attachment: YARN-2212.5.rebase.patch > ApplicationMaster needs to find a way to update the AMRMToken periodically > -- > > Key: YARN-2212 > URL: https://issues.apache.org/jira/browse/YARN-2212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2212.1.patch, YARN-2212.2.patch, > YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, > YARN-2212.5.patch, YARN-2212.5.rebase.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081382#comment-14081382 ] Hadoop QA commented on YARN-2008: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658970/YARN-2008.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4500//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4500//console This message is automatically generated. > CapacityScheduler may report incorrect queueMaxCap if there is hierarchy > queue structure > - > > Key: YARN-2008 > URL: https://issues.apache.org/jira/browse/YARN-2008 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.3.0 >Reporter: Chen He >Assignee: Craig Welch > Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, > YARN-2008.4.patch > > > If there are two queues, both allowed to use 100% of the actual resources in > the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and > there is not actual space available. If we use current method to get > headroom, CapacityScheduler thinks there are still available resources for > users in Q1 but they have been used by Q2. > If the CapacityScheduelr has a hierarchy queue structure, it may report > incorrect queueMaxCap. Here is a example > ||||rootQueue|| || > | | / | > \ | > | L1ParentQueue1 | | > L1ParentQueue2| > | (allowed to use up 80% of its parent)| | (allowed to use 20% > in minimum of its parent)| > |/ | \ || > | L2LeafQueue1 |L2LeafQueue2 | | > |(50% of its parent) | (50% of its parent in minimum) | | > When we calculate headroom of a user in L2LeafQueue2, current method will > think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. > However, without checking L1ParentQueue1, we are not sure. It is possible > that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, > L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081341#comment-14081341 ] Milan Potocnik commented on YARN-1994: -- [~cwelch] looks good, thanks for the effort! +1 from me > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, > YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, > YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, > YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, > YARN-1994.6.patch, YARN-1994.7.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081331#comment-14081331 ] Hadoop QA commented on YARN-2069: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658971/YARN-2069-trunk-8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4499//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4499//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4499//console This message is automatically generated. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2375) Allow enabling/disabling timeline server per framework
Jonathan Eagles created YARN-2375: - Summary: Allow enabling/disabling timeline server per framework Key: YARN-2375 URL: https://issues.apache.org/jira/browse/YARN-2375 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081227#comment-14081227 ] Hadoop QA commented on YARN-2033: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658949/YARN-2033_ALL.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 20 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4498//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4498//console This message is automatically generated. > Investigate merging generic-history into the Timeline Store > --- > > Key: YARN-2033 > URL: https://issues.apache.org/jira/browse/YARN-2033 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, > YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, > YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, > YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch > > > Having two different stores isn't amicable to generic insights on what's > happening with applications. This is to investigate porting generic-history > into the Timeline Store. > One goal is to try and retain most of the client side interfaces as close to > what we have today. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081234#comment-14081234 ] Carlo Curino commented on YARN-1707: Agreed on all of the above. {quote} I think for moving application across queue is not a ReservationSystem specific change. I would suggest to check it will not violate restrictions in target queue before moving it. {quote} This makes sense, we should compile a list of invariant to check for (I have a few in mind, but feedback is likely useful). Thanks, Carlo > Making the CapacityScheduler more dynamic > - > > Key: YARN-1707 > URL: https://issues.apache.org/jira/browse/YARN-1707 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Labels: capacity-scheduler > Attachments: YARN-1707.patch > > > The CapacityScheduler is a rather static at the moment, and refreshqueue > provides a rather heavy-handed way to reconfigure it. Moving towards > long-running services (tracked in YARN-896) and to enable more advanced > admission control and resource parcelling we need to make the > CapacityScheduler more dynamic. This is instrumental to the umbrella jira > YARN-1051. > Concretely this require the following changes: > * create queues dynamically > * destroy queues dynamically > * dynamically change queue parameters (e.g., capacity) > * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% > instead of ==100% > We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081207#comment-14081207 ] Craig Welch commented on YARN-1994: --- [~mipoto] Can you take a look at the latest patch? > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, > YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, > YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, > YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, > YARN-1994.6.patch, YARN-1994.7.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081203#comment-14081203 ] Xuan Gong commented on YARN-1994: - [~mipoto] Do you have any other comments for this ? > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, > YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, > YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, > YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, > YARN-1994.6.patch, YARN-1994.7.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-2069: Attachment: YARN-2069-trunk-8.patch > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081193#comment-14081193 ] Mayank Bansal commented on YARN-2069: - Hi [~wangda] , Thanks for your review comments. Updating the patch with the fix. Thanks, Mayank > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2008: -- Attachment: YARN-2008.4.patch Some additional tests for direct siblings > CapacityScheduler may report incorrect queueMaxCap if there is hierarchy > queue structure > - > > Key: YARN-2008 > URL: https://issues.apache.org/jira/browse/YARN-2008 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.3.0 >Reporter: Chen He >Assignee: Craig Welch > Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, > YARN-2008.4.patch > > > If there are two queues, both allowed to use 100% of the actual resources in > the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and > there is not actual space available. If we use current method to get > headroom, CapacityScheduler thinks there are still available resources for > users in Q1 but they have been used by Q2. > If the CapacityScheduelr has a hierarchy queue structure, it may report > incorrect queueMaxCap. Here is a example > ||||rootQueue|| || > | | / | > \ | > | L1ParentQueue1 | | > L1ParentQueue2| > | (allowed to use up 80% of its parent)| | (allowed to use 20% > in minimum of its parent)| > |/ | \ || > | L2LeafQueue1 |L2LeafQueue2 | | > |(50% of its parent) | (50% of its parent in minimum) | | > When we calculate headroom of a user in L2LeafQueue2, current method will > think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. > However, without checking L1ParentQueue1, we are not sure. It is possible > that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, > L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2283) RM failed to release the AM container
[ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081170#comment-14081170 ] Sunil G commented on YARN-2283: --- Thank you [~jlowe]. Yes, I have taken the thread dump and could see ThreadPoolExecutor is still there. I have applied patch and verified the same, it is not creating the same problem. Thank you. > RM failed to release the AM container > - > > Key: YARN-2283 > URL: https://issues.apache.org/jira/browse/YARN-2283 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 > Environment: NM1: AM running > NM2: Map task running > mapreduce.map.maxattempts=1 >Reporter: Nishan Shetty >Priority: Critical > > During container stability test i faced this problem > While job is running map task got killed > Observe that eventhough application is FAILED MRAppMaster process is running > till timeout because RM did not release the AM container > {code} > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_1405318134611_0002_01_05 Container Transitioned from RUNNING to > COMPLETED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Completed container: container_1405318134611_0002_01_05 in state: > COMPLETED event:FINISHED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos > OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS > APPID=application_1405318134611_0002 > CONTAINERID=container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: > Finish information of container container_1405318134611_0002_01_05 is > written > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: > Stored the finish data of container container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: > Released container container_1405318134611_0002_01_05 of capacity > on host HOST-10-18-40-153:45026, which currently has > 1 containers, used and > available, release resources=true > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > default used= numContainers=1 user=testos > user-resources= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > completedContainer container=Container: [ContainerId: > container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, > NodeHttpAddress: HOST-10-18-40-153:45025, Resource: , > Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 > }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=, usedCapacity=0.25, > absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster= vCores:8> > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 > used= cluster= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting completed queue: root.default stats: default: capacity=1.0, > absoluteCapacity=1.0, usedResources=, > usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application attempt appattempt_1405318134611_0002_01 released container > container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 > #containers=1 available=6144 used=2048 with event: FINISHED > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Updating application attempt appattempt_1405318134611_0002_01 with final > state: FINISHING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating > application application_1405318134611_0002 with final state: FINISHING > 2014-07-14 14:43:34,947 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: NodeDataChanged with state:SyncConnected for > path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/app
[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2033: -- Attachment: YARN-2033_ALL.4.patch > Investigate merging generic-history into the Timeline Store > --- > > Key: YARN-2033 > URL: https://issues.apache.org/jira/browse/YARN-2033 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, > YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, > YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, > YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch > > > Having two different stores isn't amicable to generic insights on what's > happening with applications. This is to investigate porting generic-history > into the Timeline Store. > One goal is to try and retain most of the client side interfaces as close to > what we have today. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2033: -- Attachment: YARN-2033.4.patch Rebase against the latest trunk, and fix some bugs > Investigate merging generic-history into the Timeline Store > --- > > Key: YARN-2033 > URL: https://issues.apache.org/jira/browse/YARN-2033 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, > YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, > YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, > YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch > > > Having two different stores isn't amicable to generic insights on what's > happening with applications. This is to investigate porting generic-history > into the Timeline Store. > One goal is to try and retain most of the client side interfaces as close to > what we have today. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically
[ https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081090#comment-14081090 ] Hadoop QA commented on YARN-2212: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658944/YARN-2212.5.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4497//console This message is automatically generated. > ApplicationMaster needs to find a way to update the AMRMToken periodically > -- > > Key: YARN-2212 > URL: https://issues.apache.org/jira/browse/YARN-2212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2212.1.patch, YARN-2212.2.patch, > YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, > YARN-2212.5.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically
[ https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081078#comment-14081078 ] Xuan Gong commented on YARN-2212: - submit the same patch > ApplicationMaster needs to find a way to update the AMRMToken periodically > -- > > Key: YARN-2212 > URL: https://issues.apache.org/jira/browse/YARN-2212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2212.1.patch, YARN-2212.2.patch, > YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, > YARN-2212.5.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically
[ https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2212: Attachment: YARN-2212.5.patch > ApplicationMaster needs to find a way to update the AMRMToken periodically > -- > > Key: YARN-2212 > URL: https://issues.apache.org/jira/browse/YARN-2212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2212.1.patch, YARN-2212.2.patch, > YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch, > YARN-2212.5.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2304) Test*WebServices* fails intermittently
[ https://issues.apache.org/jira/browse/YARN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081029#comment-14081029 ] Zhijie Shen commented on YARN-2304: --- It seems that the test failures don't happen any more. Shall we close the jira? > Test*WebServices* fails intermittently > -- > > Key: YARN-2304 > URL: https://issues.apache.org/jira/browse/YARN-2304 > Project: Hadoop YARN > Issue Type: Test >Reporter: Tsuyoshi OZAWA > Attachments: test-failure-log-RMWeb.txt > > > TestNMWebService, TestRMWebService, and TestAMWebService get failed with > address already get bind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces
[ https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081025#comment-14081025 ] Craig Welch commented on YARN-1994: --- [~xgong] [~arpitagarwal] [~mipoto] patch .15 should be good to go - please take a look. This is the .11 patch Xuan and Arpit already +1ed with the following two changes: Milan's logic to support overriding the hostname in bind-host + service address cases added back it - factored slightly differently to insure it does not change behavior unless these have been configured, and moved to overloaded methods in Configuration where the base logic resides. The only other change was that I also moved the getSocketAddr to Configuration as well, I had wanted to do this originally to bring it closer to the original code - I didn't bother, but since I was making changes/retesting anyway, I went ahead and did it. The new tests were changed to match. [~mipoto], I successfully tested this with an "introduced hostname" which was not the "base hostname" of the box, and it worked as desired (this overrode the used name/connect address based on bind-host + address configuration to the "introduced hostname") > Expose YARN/MR endpoints on multiple interfaces > --- > > Key: YARN-1994 > URL: https://issues.apache.org/jira/browse/YARN-1994 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager, webapp >Affects Versions: 2.4.0 >Reporter: Arpit Agarwal >Assignee: Craig Welch > Attachments: YARN-1994.0.patch, YARN-1994.1.patch, > YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.12.patch, > YARN-1994.13.patch, YARN-1994.14.patch, YARN-1994.15.patch, > YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, > YARN-1994.6.patch, YARN-1994.7.patch > > > YARN and MapReduce daemons currently do not support specifying a wildcard > address for the server endpoints. This prevents the endpoints from being > accessible from all interfaces on a multihomed machine. > Note that if we do specify INADDR_ANY for any of the options, it will break > clients as they will attempt to connect to 0.0.0.0. We need a solution that > allows specifying a hostname or IP-address for clients while requesting > wildcard bind for the servers. > (List of endpoints is in a comment below) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080961#comment-14080961 ] Hudson commented on YARN-2347: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1848 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1848/]) YARN-2347. Consolidated RMStateVersion and NMDBSchemaVersion into Version in yarn-server-common. Contributed by Junping Du. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1614838) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/Version.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb/VersionPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMStateVersion.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/RMStateVersionPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java > Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in > yarn-server-common > -
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080941#comment-14080941 ] Vinod Kumar Vavilapalli commented on YARN-2198: --- Also, a nit: WintuilsProcessStubExecutor.assumeComplete -> assertComplete? > Remove the need to run NodeManager as privileged account for Windows Secure > Container Executor > -- > > Key: YARN-2198 > URL: https://issues.apache.org/jira/browse/YARN-2198 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-2198.1.patch, YARN-2198.2.patch > > > YARN-1972 introduces a Secure Windows Container Executor. However this > executor requires a the process launching the container to be LocalSystem or > a member of the a local Administrators group. Since the process in question > is the NodeManager, the requirement translates to the entire NM to run as a > privileged account, a very large surface area to review and protect. > This proposal is to move the privileged operations into a dedicated NT > service. The NM can run as a low privilege account and communicate with the > privileged NT service when it needs to launch a container. This would reduce > the surface exposed to the high privileges. > There has to exist a secure, authenticated and authorized channel of > communication between the NM and the privileged NT service. Possible > alternatives are a new TCP endpoint, Java RPC etc. My proposal though would > be to use Windows LPC (Local Procedure Calls), which is a Windows platform > specific inter-process communication channel that satisfies all requirements > and is easy to deploy. The privileged NT service would register and listen on > an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop > with libwinutils which would host the LPC client code. The client would > connect to the LPC port (NtConnectPort) and send a message requesting a > container launch (NtRequestWaitReplyPort). LPC provides authentication and > the privileged NT service can use authorization API (AuthZ) to validate the > caller. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls
[ https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080934#comment-14080934 ] Junping Du commented on YARN-2051: -- +1. Patch looks good to me. Will commit it tomorrow if no more feedback from others. > Fix code bug and add more unit tests for PBImpls > > > Key: YARN-2051 > URL: https://issues.apache.org/jira/browse/YARN-2051 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Binglin Chang >Priority: Critical > Attachments: YARN-2051.v1.patch, YARN-2051.v2.patch > > > From YARN-2016, we can see some bug could exist in PB implementation of > protocol. The bad news is most of these PBImpl don't have any unit test to > verify the info is not lost or changed after serialization/deserialization. > We should add more tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080936#comment-14080936 ] Vinod Kumar Vavilapalli commented on YARN-2198: --- Skimmed through the Windows native code and the common changes, look fine overall. Hoping someone with Windows knowledge ([~ivanmi]?) look at the native code and someone else ([~cnauroth]?) at the common changes more carefully. Reviewed the patch with focus on the YARN changes. Some comments follow.. bq. With a helper service the nodemanager no longer gets a free lunch of accessing the task stdout/stderr The NM never explicitly reads the stdout/stderr from the container, the streams are redirected today to their own log files according as the user's code dictates (for e.g in linux bash -c "user-command.sh 1> stderr 2>stdout"). Do we need to do this in the WintuilsProcessStubExecutor ? The LinuxContainerExecutor reads the configuration from a container-executor.cfg. We may want to unify the configuration for the executors if in another JIRA. Rename hadoopwinutilsvc* interfaces, file-names, classes to be something like WindowsContainerLauncherService or similar to be explicit? Not sure to me from the patch as to how the service's port is configured. Is it at the start time or through some configuration? bq. 1. Service Access check. Sorry for repeating what you said but if I understand correctly, we need two things (1) restricting users who can launch the special service and (2) restricting callers who can invoke the RPCs. So, this is done by the combination of the OS doing the authentication and the authorization being explicitly done by the service using the allowed list. Right? > Remove the need to run NodeManager as privileged account for Windows Secure > Container Executor > -- > > Key: YARN-2198 > URL: https://issues.apache.org/jira/browse/YARN-2198 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-2198.1.patch, YARN-2198.2.patch > > > YARN-1972 introduces a Secure Windows Container Executor. However this > executor requires a the process launching the container to be LocalSystem or > a member of the a local Administrators group. Since the process in question > is the NodeManager, the requirement translates to the entire NM to run as a > privileged account, a very large surface area to review and protect. > This proposal is to move the privileged operations into a dedicated NT > service. The NM can run as a low privilege account and communicate with the > privileged NT service when it needs to launch a container. This would reduce > the surface exposed to the high privileges. > There has to exist a secure, authenticated and authorized channel of > communication between the NM and the privileged NT service. Possible > alternatives are a new TCP endpoint, Java RPC etc. My proposal though would > be to use Windows LPC (Local Procedure Calls), which is a Windows platform > specific inter-process communication channel that satisfies all requirements > and is easy to deploy. The privileged NT service would register and listen on > an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop > with libwinutils which would host the LPC client code. The client would > connect to the LPC port (NtConnectPort) and send a message requesting a > container launch (NtRequestWaitReplyPort). LPC provides authentication and > the privileged NT service can use authorization API (AuthZ) to validate the > caller. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running
[ https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080931#comment-14080931 ] Junping Du commented on YARN-2371: -- Look at the code on trunk again. It looks like the check is already on appID rather than appAttemptID, so exception in description above shouldn't happen on latest trunk if only appAttemptID is different. [~zhiguohong], are you using trunk to have this exception or some previous released version? {code} if (!nmTokenIdentifier.getApplicationAttemptId().getApplicationId().equals( containerId.getApplicationAttemptId().getApplicationId())) { unauthorized = true; messageBuilder.append("\nNMToken for application attempt : ") .append(nmTokenIdentifier.getApplicationAttemptId()) .append(" was used for starting container with container token") .append(" issued for application attempt : ") .append(containerId.getApplicationAttemptId()); } {code} Though, the message should be improved to reflect applicationID but not attemptID. > Wrong NMToken is issued when NM preserving restarts with containers running > --- > > Key: YARN-2371 > URL: https://issues.apache.org/jira/browse/YARN-2371 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > Attachments: YARN-2371.patch > > > When application is submitted with > "ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == > true", and NM is restarted with containers running, wrong NMToken is issued > to AM through RegisterApplicationMasterResponse. > See the NM log: > {code} > 2014-07-30 11:59:58,941 ERROR > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Unauthorized request to start container.- > NMToken for application attempt : appattempt_1406691610864_0002_01 was > used for starting container with container token issued for application > attempt : appattempt_1406691610864_0002_02 > {code} > The reason is in below code: > {code} > createAndGetNMToken(String applicationSubmitter, > ApplicationAttemptId appAttemptId, Container container) { > .. > Token token = > createNMToken(container.getId().getApplicationAttemptId(), > container.getNodeId(), applicationSubmitter); > .. > } > {code} > "appAttemptId" instead of "container.getId().getApplicationAttemptId()" > should be passed to "createNMToken". -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running
[ https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080925#comment-14080925 ] Junping Du commented on YARN-2371: -- Nice finding, [~zhiguohong]! The fix here looks reasonable to me. It reminds me that we also have recently changes to replace checking appAttemptID with checking appID in authorizing NMToken for the similar reason. For unit test, I suggest to have a separated test method or at least separated code segment for your case with proper document on scenario of cases. > Wrong NMToken is issued when NM preserving restarts with containers running > --- > > Key: YARN-2371 > URL: https://issues.apache.org/jira/browse/YARN-2371 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo > Attachments: YARN-2371.patch > > > When application is submitted with > "ApplicationSubmissionContext.getKeepContainersAcrossApplicationAttempts() == > true", and NM is restarted with containers running, wrong NMToken is issued > to AM through RegisterApplicationMasterResponse. > See the NM log: > {code} > 2014-07-30 11:59:58,941 ERROR > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Unauthorized request to start container.- > NMToken for application attempt : appattempt_1406691610864_0002_01 was > used for starting container with container token issued for application > attempt : appattempt_1406691610864_0002_02 > {code} > The reason is in below code: > {code} > createAndGetNMToken(String applicationSubmitter, > ApplicationAttemptId appAttemptId, Container container) { > .. > Token token = > createNMToken(container.getId().getApplicationAttemptId(), > container.getNodeId(), applicationSubmitter); > .. > } > {code} > "appAttemptId" instead of "container.getId().getApplicationAttemptId()" > should be passed to "createNMToken". -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080922#comment-14080922 ] Hudson commented on YARN-2347: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1823 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1823/]) YARN-2347. Consolidated RMStateVersion and NMDBSchemaVersion into Version in yarn-server-common. Contributed by Junping Du. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1614838) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/Version.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb/VersionPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMStateVersion.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/RMStateVersionPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java > Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in > yarn-server-common > ---
[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell
[ https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080914#comment-14080914 ] Naganarasimha G R commented on YARN-2374: - I dont know how much this might help . refer this link with jvm bug "http://bugs.java.com/view_bug.do?bug_id=7166687";. > YARN trunk build failing TestDistributedShell.testDSShell > - > > Key: YARN-2374 > URL: https://issues.apache.org/jira/browse/YARN-2374 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2374.0.patch > > > The YARN trunk build has been failing for the last few days in the > distributed shell module. > {noformat} > testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 27.269 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2283) RM failed to release the AM container
[ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-2283. -- Resolution: Duplicate Yes, it is very likely a duplicate of MAPREDUCE-5888, especially since it no longer reproduces on later releases. Resolving as a duplicate. The RM is not failing to release the container, rather the RM is intentionally giving the AM some time to clean things up after unregistering (i.e.: the FINISHING state). Unfortunately before MAPREDUCE-5888 was fixed the AM could hang during a failed job because of a non-daemon thread that was lingering around and preventing the JVM from shutting down. The RM eventually decides that the AM has used too much time to cleanup and kills it. > RM failed to release the AM container > - > > Key: YARN-2283 > URL: https://issues.apache.org/jira/browse/YARN-2283 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 > Environment: NM1: AM running > NM2: Map task running > mapreduce.map.maxattempts=1 >Reporter: Nishan Shetty >Priority: Critical > > During container stability test i faced this problem > While job is running map task got killed > Observe that eventhough application is FAILED MRAppMaster process is running > till timeout because RM did not release the AM container > {code} > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_1405318134611_0002_01_05 Container Transitioned from RUNNING to > COMPLETED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Completed container: container_1405318134611_0002_01_05 in state: > COMPLETED event:FINISHED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos > OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS > APPID=application_1405318134611_0002 > CONTAINERID=container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: > Finish information of container container_1405318134611_0002_01_05 is > written > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: > Stored the finish data of container container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: > Released container container_1405318134611_0002_01_05 of capacity > on host HOST-10-18-40-153:45026, which currently has > 1 containers, used and > available, release resources=true > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > default used= numContainers=1 user=testos > user-resources= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > completedContainer container=Container: [ContainerId: > container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, > NodeHttpAddress: HOST-10-18-40-153:45025, Resource: , > Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 > }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=, usedCapacity=0.25, > absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster= vCores:8> > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 > used= cluster= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting completed queue: root.default stats: default: capacity=1.0, > absoluteCapacity=1.0, usedResources=, > usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application attempt appattempt_1405318134611_0002_01 released container > container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 > #containers=1 available=6144 used=2048 with event: FINISHED > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Updating application attempt appattempt_1405318134611_0002_01 with final > state: FINISHING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn
[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls
[ https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080829#comment-14080829 ] Hadoop QA commented on YARN-2051: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658915/YARN-2051.v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4496//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4496//console This message is automatically generated. > Fix code bug and add more unit tests for PBImpls > > > Key: YARN-2051 > URL: https://issues.apache.org/jira/browse/YARN-2051 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Binglin Chang >Priority: Critical > Attachments: YARN-2051.v1.patch, YARN-2051.v2.patch > > > From YARN-2016, we can see some bug could exist in PB implementation of > protocol. The bad news is most of these PBImpl don't have any unit test to > verify the info is not lost or changed after serialization/deserialization. > We should add more tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell
[ https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080805#comment-14080805 ] Varun Vasudev commented on YARN-2374: - >From Jenkins: {noformat} TestDistributedShell.testDSShell:193 Expected host name to start with 'asf905.gq1.ygridcore.net/67.195.81.149', was 'asf905/67.195.81.149'. Expected rpc port to be '-1', was '-1'. {noformat} It looks like the calls to NetUtils.getHostName() can return a short name or a fully qualified domain name. I'm not sure how to resolve this. The test code and the code in distributed shell app master call NetUtils.getHostName() and are getting different results. One solution could be to modify both the distributed shell app master and the test to use fully qualified domain names, but I'm open to suggestions. > YARN trunk build failing TestDistributedShell.testDSShell > - > > Key: YARN-2374 > URL: https://issues.apache.org/jira/browse/YARN-2374 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2374.0.patch > > > The YARN trunk build has been failing for the last few days in the > distributed shell module. > {noformat} > testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 27.269 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2283) RM failed to release the AM container
[ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080804#comment-14080804 ] Sunil G commented on YARN-2283: --- Seems to be duplicate to MAPREDUCE-5888 [~jlowe] cud u pls confirm whether its the same issue. > RM failed to release the AM container > - > > Key: YARN-2283 > URL: https://issues.apache.org/jira/browse/YARN-2283 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 > Environment: NM1: AM running > NM2: Map task running > mapreduce.map.maxattempts=1 >Reporter: Nishan Shetty >Priority: Critical > > During container stability test i faced this problem > While job is running map task got killed > Observe that eventhough application is FAILED MRAppMaster process is running > till timeout because RM did not release the AM container > {code} > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_1405318134611_0002_01_05 Container Transitioned from RUNNING to > COMPLETED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Completed container: container_1405318134611_0002_01_05 in state: > COMPLETED event:FINISHED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos > OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS > APPID=application_1405318134611_0002 > CONTAINERID=container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: > Finish information of container container_1405318134611_0002_01_05 is > written > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: > Stored the finish data of container container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: > Released container container_1405318134611_0002_01_05 of capacity > on host HOST-10-18-40-153:45026, which currently has > 1 containers, used and > available, release resources=true > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > default used= numContainers=1 user=testos > user-resources= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > completedContainer container=Container: [ContainerId: > container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, > NodeHttpAddress: HOST-10-18-40-153:45025, Resource: , > Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 > }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=, usedCapacity=0.25, > absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster= vCores:8> > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 > used= cluster= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting completed queue: root.default stats: default: capacity=1.0, > absoluteCapacity=1.0, usedResources=, > usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application attempt appattempt_1405318134611_0002_01 released container > container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 > #containers=1 available=6144 used=2048 with event: FINISHED > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Updating application attempt appattempt_1405318134611_0002_01 with final > state: FINISHING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating > application application_1405318134611_0002 with final state: FINISHING > 2014-07-14 14:43:34,947 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: NodeDataChanged with state:SyncConnected for > path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01 > for Service > org.apache.hadoop.yarn.server.resourcemanager.rec
[jira] [Updated] (YARN-2051) Fix code bug and add more unit tests for PBImpls
[ https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated YARN-2051: Attachment: YARN-2051.v2.patch Thanks for the review and comments Junping, I updated the patch addressing your comments. > Fix code bug and add more unit tests for PBImpls > > > Key: YARN-2051 > URL: https://issues.apache.org/jira/browse/YARN-2051 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Binglin Chang >Priority: Critical > Attachments: YARN-2051.v1.patch, YARN-2051.v2.patch > > > From YARN-2016, we can see some bug could exist in PB implementation of > protocol. The bad news is most of these PBImpl don't have any unit test to > verify the info is not lost or changed after serialization/deserialization. > We should add more tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls
[ https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080780#comment-14080780 ] Junping Du commented on YARN-2051: -- Again, good work, [~decster]! Some comments below, most of them are trivial: {code} +System.out.printf("Validate %s %s\n", recordClass.getName(), +protoClass.getName()); {code} Please replace this and other places that try to print to console with LOG. {code} +ret = Sets.newHashSet(genTypeValue(params[0])); {code} Please remove unnecessary space in the end of this line. {code} throw new IllegalArgumentException("type not support: " + type); {code} May be "type: " + type + " is not supported" is more readable? {code} + private static Object genByNewInstance(Class clazz) throws Exception { {code} generateNewInstance() sounds like a better name? {code} ret = newInstance.invoke(null, args); {code} The code here has risk of NPE if newInstance method is not found previously (it is possible, as newInstance() method is not forced to have, although most class obey this rule). Better to add some exception handling here. {code} + } else if (clazz.equals(ByteBuffer.class)) { +// return new ByteBuffer every time +// to prevent potential side effects +return ByteBuffer.allocate(4); + } {code} What's reasonable value we generate here for ByteBuffer? Just empty. Isn't it? > Fix code bug and add more unit tests for PBImpls > > > Key: YARN-2051 > URL: https://issues.apache.org/jira/browse/YARN-2051 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Binglin Chang >Priority: Critical > Attachments: YARN-2051.v1.patch > > > From YARN-2016, we can see some bug could exist in PB implementation of > protocol. The bad news is most of these PBImpl don't have any unit test to > verify the info is not lost or changed after serialization/deserialization. > We should add more tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls
[ https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080750#comment-14080750 ] Hadoop QA commented on YARN-2051: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655677/YARN-2051.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4495//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4495//console This message is automatically generated. > Fix code bug and add more unit tests for PBImpls > > > Key: YARN-2051 > URL: https://issues.apache.org/jira/browse/YARN-2051 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Binglin Chang >Priority: Critical > Attachments: YARN-2051.v1.patch > > > From YARN-2016, we can see some bug could exist in PB implementation of > protocol. The bad news is most of these PBImpl don't have any unit test to > verify the info is not lost or changed after serialization/deserialization. > We should add more tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080735#comment-14080735 ] Junping Du commented on YARN-2372: -- Nice catch, [~azuryy]! Actually, these are special punctuation from Chinese input which is hardly to find. +1 on the patch. [~azuryy], any more places with the same issue? If not, I will commit it shortly. > There are Chinese Characters in the FairScheduler's document > > > Key: YARN-2372 > URL: https://issues.apache.org/jira/browse/YARN-2372 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.4.1 >Reporter: Fengdong Yu >Assignee: Fengdong Yu >Priority: Minor > Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, > YARN-2372.patch, YARN-2372.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls
[ https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080727#comment-14080727 ] Junping Du commented on YARN-2051: -- Not sure if patch is still updated, manually kick off Jenkins test again. > Fix code bug and add more unit tests for PBImpls > > > Key: YARN-2051 > URL: https://issues.apache.org/jira/browse/YARN-2051 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Binglin Chang >Priority: Critical > Attachments: YARN-2051.v1.patch > > > From YARN-2016, we can see some bug could exist in PB implementation of > protocol. The bad news is most of these PBImpl don't have any unit test to > verify the info is not lost or changed after serialization/deserialization. > We should add more tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080722#comment-14080722 ] Hudson commented on YARN-2347: -- FAILURE: Integrated in Hadoop-Yarn-trunk #629 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/629/]) YARN-2347. Consolidated RMStateVersion and NMDBSchemaVersion into Version in yarn-server-common. Contributed by Junping Du. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1614838) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/Version.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb/VersionPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMStateVersion.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/RMStateVersionPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java > Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in > yarn-server-common > -
[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls
[ https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080719#comment-14080719 ] Junping Du commented on YARN-2051: -- Forget to mention, +1 on the idea for testing these PB objects automatically. I love it so much! :) > Fix code bug and add more unit tests for PBImpls > > > Key: YARN-2051 > URL: https://issues.apache.org/jira/browse/YARN-2051 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Binglin Chang >Priority: Critical > Attachments: YARN-2051.v1.patch > > > From YARN-2016, we can see some bug could exist in PB implementation of > protocol. The bad news is most of these PBImpl don't have any unit test to > verify the info is not lost or changed after serialization/deserialization. > We should add more tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2051) Fix code bug and add more unit tests for PBImpls
[ https://issues.apache.org/jira/browse/YARN-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080716#comment-14080716 ] Junping Du commented on YARN-2051: -- Hi [~decster], thanks for working on this. I will review your patch ASAP. > Fix code bug and add more unit tests for PBImpls > > > Key: YARN-2051 > URL: https://issues.apache.org/jira/browse/YARN-2051 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Binglin Chang >Priority: Critical > Attachments: YARN-2051.v1.patch > > > From YARN-2016, we can see some bug could exist in PB implementation of > protocol. The bad news is most of these PBImpl don't have any unit test to > verify the info is not lost or changed after serialization/deserialization. > We should add more tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080689#comment-14080689 ] Hadoop QA commented on YARN-2372: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658877/YARN-2372.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4494//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4494//console This message is automatically generated. > There are Chinese Characters in the FairScheduler's document > > > Key: YARN-2372 > URL: https://issues.apache.org/jira/browse/YARN-2372 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.4.1 >Reporter: Fengdong Yu >Assignee: Fengdong Yu >Priority: Minor > Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, > YARN-2372.patch, YARN-2372.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080680#comment-14080680 ] Hudson commented on YARN-2347: -- FAILURE: Integrated in Hadoop-trunk-Commit #5991 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5991/]) YARN-2347. Consolidated RMStateVersion and NMDBSchemaVersion into Version in yarn-server-common. Contributed by Junping Du. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1614838) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/Version.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/records/impl/pb/VersionPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMStateVersion.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/RMStateVersionPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java > Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in > yarn-server-common > ---
[jira] [Updated] (YARN-2372) There are Chinese Characters in the FairScheduler's document
[ https://issues.apache.org/jira/browse/YARN-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengdong Yu updated YARN-2372: -- Attachment: YARN-2372.patch > There are Chinese Characters in the FairScheduler's document > > > Key: YARN-2372 > URL: https://issues.apache.org/jira/browse/YARN-2372 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.4.1 >Reporter: Fengdong Yu >Assignee: Fengdong Yu >Priority: Minor > Attachments: YARN-2372.patch, YARN-2372.patch, YARN-2372.patch, > YARN-2372.patch, YARN-2372.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080669#comment-14080669 ] Hadoop QA commented on YARN-2347: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658862/YARN-2347-v6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4492//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4492//console This message is automatically generated. > Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in > yarn-server-common > > > Key: YARN-2347 > URL: https://issues.apache.org/jira/browse/YARN-2347 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, > YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347-v6.patch, YARN-2347.patch > > > We have similar things for version state for RM, NM, TS (TimelineServer), > etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically
[ https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080665#comment-14080665 ] Hadoop QA commented on YARN-2212: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658859/YARN-2212.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebApp org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnAMRMTokenRollOver org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA org.apache.hadoop.yarn.client.TestRMFailover org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.client.api.impl.TestNMClient org.apache.hadoop.yarn.client.TestGetGroups org.apache.hadoop.yarn.client.TestResourceManagerAdministrationProtocolPBClientImpl org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.api.impl.TestYarnClient org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService org.apache.hadoop.yarn.server.resourcemanager.TestRMHA org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4491//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4491//console This message is automatically generated. > ApplicationMaster needs to find a way to update the AMRMToken periodically > -- > > Key: YARN-2212 > URL: https://issues.apache.org/jira/browse/YARN-2212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2212.1.patch, YARN-2212.2.patch, > YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
[ https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenwu Peng updated YARN-1572: - Description: we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit 4 in 20 times). 2014-07-31 04:18:19,653 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_1406794589275_0001_01_21 of capacity on host datanode10:57281, which has 6 containers, used and available after allocation 2014-07-31 04:18:19,654 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:268) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:136) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:683) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:602) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:560) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:488) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:729) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:774) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599) at java.lang.Thread.run(Thread.java:662) 2014-07-31 04:18:19,655 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. was: we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit 4 in 20 times). Steps: 1. setup hadoop 2.2.0 environment 2. Run for i in {1..10}; do /hadoop/hadoop-smoke/bin/hadoop jar /hadoop/hadoop-smoke/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*.jar org.apache.hadoop.fs.TestDFSIO -write -nrFiles 30 -fileSize 64MB; sleep 10;done 2014-01-08 03:56:14,082 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:291) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:252) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:614) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:524) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:482) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:419) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:658) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:687) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:95) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) at java.lang.Thread.run(Thread.java:662) will attach log and configure files later Note: My topology file: 10.111.89.230 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.231 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.232 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.239 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com 10.111.89.233 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.234 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.240 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com 10.111.89.236 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com 10.111.89.241 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com 10.111.89.238 /QE2/sin2-pekaurora-bdcqe048.en
[jira] [Commented] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell
[ https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080655#comment-14080655 ] Hadoop QA commented on YARN-2374: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658863/apache-yarn-2374.0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4493//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4493//console This message is automatically generated. > YARN trunk build failing TestDistributedShell.testDSShell > - > > Key: YARN-2374 > URL: https://issues.apache.org/jira/browse/YARN-2374 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2374.0.patch > > > The YARN trunk build has been failing for the last few days in the > distributed shell module. > {noformat} > testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 27.269 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
[ https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenwu Peng updated YARN-1572: - Attachment: YARN-1572-log.tar.gz Thanks a lot Junping! please refer to YARN-1572-log.tar.gz for the log of NPE for latest trunk. java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:311) > Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal > -- > > Key: YARN-1572 > URL: https://issues.apache.org/jira/browse/YARN-1572 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 >Reporter: Wenwu Peng >Assignee: Wenwu Peng > Attachments: YARN-1572-log.tar.gz, conf.tar.gz, log.tar.gz > > > we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit > 4 in 20 times). > Steps: > 1. setup hadoop 2.2.0 environment > 2. Run for i in {1..10}; do /hadoop/hadoop-smoke/bin/hadoop jar > /hadoop/hadoop-smoke/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*.jar > org.apache.hadoop.fs.TestDFSIO -write -nrFiles 30 -fileSize 64MB; sleep > 10;done > 2014-01-08 03:56:14,082 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:291) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:252) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:294) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:614) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:524) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:482) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:419) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:658) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:687) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:95) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:662) > will attach log and configure files later > Note: > My topology file: > 10.111.89.230 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com > 10.111.89.231 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com > 10.111.89.232 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com > 10.111.89.239 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com > 10.111.89.233 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com > 10.111.89.234 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com > 10.111.89.240 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com > 10.111.89.236 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com > 10.111.89.241 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com > 10.111.89.238 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com > 10.111.89.242 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2374) YARN trunk build failing TestDistributedShell.testDSShell
[ https://issues.apache.org/jira/browse/YARN-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2374: Attachment: apache-yarn-2374.0.patch Patch with debug information added to figure out root cause. > YARN trunk build failing TestDistributedShell.testDSShell > - > > Key: YARN-2374 > URL: https://issues.apache.org/jira/browse/YARN-2374 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2374.0.patch > > > The YARN trunk build has been failing for the last few days in the > distributed shell module. > {noformat} > testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 27.269 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:188) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2347: - Attachment: YARN-2347-v6.patch Address latest comments from [~zjshen] in v6 patch. > Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in > yarn-server-common > > > Key: YARN-2347 > URL: https://issues.apache.org/jira/browse/YARN-2347 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, > YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347-v6.patch, YARN-2347.patch > > > We have similar things for version state for RM, NM, TS (TimelineServer), > etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080613#comment-14080613 ] Junping Du commented on YARN-2347: -- Sounds good. Will upload a new patch soon. Thx! > Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in > yarn-server-common > > > Key: YARN-2347 > URL: https://issues.apache.org/jira/browse/YARN-2347 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, > YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347.patch > > > We have similar things for version state for RM, NM, TS (TimelineServer), > etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080610#comment-14080610 ] Zhijie Shen commented on YARN-2347: --- Make sense. As MR has already used Version, should we at least mark Version as \@LimitedPrivate(\{"YARN", "MAPREDUCE"\})? > Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in > yarn-server-common > > > Key: YARN-2347 > URL: https://issues.apache.org/jira/browse/YARN-2347 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, > YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347.patch > > > We have similar things for version state for RM, NM, TS (TimelineServer), > etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically
[ https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2212: Attachment: YARN-2212.5.patch > ApplicationMaster needs to find a way to update the AMRMToken periodically > -- > > Key: YARN-2212 > URL: https://issues.apache.org/jira/browse/YARN-2212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2212.1.patch, YARN-2212.2.patch, > YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch, YARN-2212.5.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1572) Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal
[ https://issues.apache.org/jira/browse/YARN-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned YARN-1572: Assignee: Wenwu Peng (was: Junping Du) [~gujilangzi], are you working on this? If so, assign this JIRA to you. Please attach the log of NPE for latest trunk, I will also help to look at it. Thx! > Low chance to hit NPE issue in AppSchedulingInfo#allocateNodeLocal > -- > > Key: YARN-1572 > URL: https://issues.apache.org/jira/browse/YARN-1572 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 >Reporter: Wenwu Peng >Assignee: Wenwu Peng > Attachments: conf.tar.gz, log.tar.gz > > > we have lower chance to hit NPE in allocateNodeLocal when run benchmark(hit > 4 in 20 times). > Steps: > 1. setup hadoop 2.2.0 environment > 2. Run for i in {1..10}; do /hadoop/hadoop-smoke/bin/hadoop jar > /hadoop/hadoop-smoke/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*.jar > org.apache.hadoop.fs.TestDFSIO -write -nrFiles 30 -fileSize 64MB; sleep > 10;done > 2014-01-08 03:56:14,082 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateNodeLocal(AppSchedulingInfo.java:291) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:252) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:294) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:614) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:524) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:482) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:419) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:658) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:687) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:95) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:662) > will attach log and configure files later > Note: > My topology file: > 10.111.89.230 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com > 10.111.89.231 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com > 10.111.89.232 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com > 10.111.89.239 /QE1/sin2-pekaurora-bdcqe046.eng.vmware.com > 10.111.89.233 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com > 10.111.89.234 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com > 10.111.89.240 /QE1/sin2-pekaurora-bdcqe017.eng.vmware.com > 10.111.89.236 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com > 10.111.89.241 /QE2/sin2-pekaurora-bdcqe047.eng.vmware.com > 10.111.89.238 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com > 10.111.89.242 /QE2/sin2-pekaurora-bdcqe048.eng.vmware.com -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080602#comment-14080602 ] Junping Du commented on YARN-2347: -- Thanks for review and comments, [~zjshen]! That's good point and I agree it is possible to be used in future by other applications. However, before the real requirements comes in (as applications don't have to follow our practice in YARN for versioning), let's play safe to keep it as private as it is mostly used among YARN and built-in MR components. We can easily to make an API public from private in future, but making a public API back to private (or change interfaces) should never happen. So, IMO, it is better to keep it as private at this moment. We can open a separated JIRA (and work) to discuss more if you have a strong feeling to public it. Thoughts? > Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in > yarn-server-common > > > Key: YARN-2347 > URL: https://issues.apache.org/jira/browse/YARN-2347 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, > YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347.patch > > > We have similar things for version state for RM, NM, TS (TimelineServer), > etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2212) ApplicationMaster needs to find a way to update the AMRMToken periodically
[ https://issues.apache.org/jira/browse/YARN-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080598#comment-14080598 ] Xuan Gong commented on YARN-2212: - bq. AMS#registerApplicationMaster changes not needed. make changes on authorizeRequest() to let it return AMRMTokenIdentifier instead of RMAppAttemptId. So, all the changes in AMS#registerApplicationMaster are for this. bq. May not say stable now. {code} @Stable public abstract Token getAMRMToken(); {code} DONE bq. ApplicationReport#getAMRMToken for unmanaged AM needs to be updated as well. When AMRMToken is rolled up, we will update the AMRMToken for current attempt. So, ApplicationReport#getAMRMToken will update bq. we can move the AMRMToken creation from RMAppAttemptImpl to AMLauncher? DONE bq. Use newInstance instead. DONE bq. Test AMRMClient automatically takes care of the new AMRMToken transfer. ADDED bq. Please run on real cluster also and set roll-over interval to a small value to make sure it actually works. tested. > ApplicationMaster needs to find a way to update the AMRMToken periodically > -- > > Key: YARN-2212 > URL: https://issues.apache.org/jira/browse/YARN-2212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2212.1.patch, YARN-2212.2.patch, > YARN-2212.3.1.patch, YARN-2212.3.patch, YARN-2212.4.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080591#comment-14080591 ] duanfa commented on YARN-1149: -- i want ask : which hadoop version did you change base? hadoop 2.0.x.x or hadoop 2.1.x.x please send to my email duanfa1...@gmail.com thanks!!! > NM throws InvalidStateTransitonException: Invalid event: > APPLICATION_LOG_HANDLING_FINISHED at RUNNING > - > > Key: YARN-1149 > URL: https://issues.apache.org/jira/browse/YARN-1149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ramya Sunil >Assignee: Xuan Gong > Fix For: 2.2.0 > > Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, > YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, > YARN-1149.8.patch, YARN-1149.9.patch, YARN-1149_branch-2.1-beta.1.patch > > > When nodemanager receives a kill signal when an application has finished > execution but log aggregation has not kicked in, > InvalidStateTransitonException: Invalid event: > APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown > {noformat} > 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just > finished : application_1377459190746_0118 > 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate > log-file for app application_1377459190746_0118 at > /app-logs/foo/logs/application_1377459190746_0118/_45454.tmp > 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService > (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation > to complete for application_1377459190746_0118 > 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for > container container_1377459190746_0118_01_04. Current good log dirs are > /tmp/yarn/local > 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate > log-file for app application_1377459190746_0118 > 2013-08-25 20:45:00,925 WARN application.Application > (ApplicationImpl.java:handle(427)) - Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > APPLICATION_LOG_HANDLING_FINISHED at RUNNING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) > at java.lang.Thread.run(Thread.java:662) > 2013-08-25 20:45:00,926 INFO application.Application > (ApplicationImpl.java:handle(430)) - Application > application_1377459190746_0118 transitioned from RUNNING to null > 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:run(463)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > is interrupted. Exiting. > 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping > server on 8040 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common
[ https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080587#comment-14080587 ] Zhijie Shen commented on YARN-2347: --- Sorry for raising another issues so late. When I try to commit the patch, I realize ShuffleHandler from MR project has a reference to Version. In this case, {code} @Private @Unstable public abstract class Version { {code} \@Private annotation seems not to be accurate. Moreover, other applications may implement their AuxiliaryService as well, right? In this case, their AuxiliaryService is likely to use Version as ShuffleHandler does. Therefore, should Version be \@Public instead, and be part of o.a.h.y.api.records in hadoop-yarn-api? > Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in > yarn-server-common > > > Key: YARN-2347 > URL: https://issues.apache.org/jira/browse/YARN-2347 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, > YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347.patch > > > We have similar things for version state for RM, NM, TS (TimelineServer), > etc. I think we should consolidate them into a common object. -- This message was sent by Atlassian JIRA (v6.2#6252)