[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing
[ https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069443#comment-15069443 ] Rohith Sharma K S commented on YARN-4497: - Thinking when it can happen attempt1 is stored , attempt2 is not stored and attempt3 is stored? One way is manually delete the attempt2 node from zookeeper. > RM might fail to restart when recovering apps whose attempts are missing > > > Key: YARN-4497 > URL: https://issues.apache.org/jira/browse/YARN-4497 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > > Find following problem when discussing in YARN-3480. > If RM fails to store some attempts in RMStateStore, there will be missing > attempts in RMStateStore, for the case storing attempt1, attempt2 and > attempt3, RM successfully stored attempt1 and attempt3, but failed to store > attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one > by one, for this case, we will recover attmept1, then attempt2. When > recovering attempt2, we call > *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find > its ApplicationAttemptStateData, but it could not find it, an error will come > at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing
[ https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069454#comment-15069454 ] Jun Gong commented on YARN-4497: In *RMStateStore#notifyStoreOperationFailedInternal*, RMStateStore might skip store errors, so RMStateStore might fail to store attempt2 for some reasons(e.g. network error), but the app could continue running, and starts a new attempt attempt3, then RMStateStore stores attempt3 successfully(suppose network is OK now). > RM might fail to restart when recovering apps whose attempts are missing > > > Key: YARN-4497 > URL: https://issues.apache.org/jira/browse/YARN-4497 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > > Find following problem when discussing in YARN-3480. > If RM fails to store some attempts in RMStateStore, there will be missing > attempts in RMStateStore, for the case storing attempt1, attempt2 and > attempt3, RM successfully stored attempt1 and attempt3, but failed to store > attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one > by one, for this case, we will recover attmept1, then attempt2. When > recovering attempt2, we call > *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find > its ApplicationAttemptStateData, but it could not find it, an error will come > at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing
[ https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069470#comment-15069470 ] Rohith Sharma K S commented on YARN-4497: - Currently, If any errors happened while storing into RMstateStore then RMStatestore is FENCED. So no more attempts are stored in state-store. And the RMStatState store state machine has transition is only from {{ACTIVE to FENCED}} but there is No {{FENCED to ACTIVE}}. If I am missing anything in flow, could you explain elaborately? > RM might fail to restart when recovering apps whose attempts are missing > > > Key: YARN-4497 > URL: https://issues.apache.org/jira/browse/YARN-4497 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > > Find following problem when discussing in YARN-3480. > If RM fails to store some attempts in RMStateStore, there will be missing > attempts in RMStateStore, for the case storing attempt1, attempt2 and > attempt3, RM successfully stored attempt1 and attempt3, but failed to store > attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one > by one, for this case, we will recover attmept1, then attempt2. When > recovering attempt2, we call > *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find > its ApplicationAttemptStateData, but it could not find it, an error will come > at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing
[ https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069477#comment-15069477 ] Rohith Sharma K S commented on YARN-4497: - I got your point, if RM HA is not configured and fail fast is false, this would happen. > RM might fail to restart when recovering apps whose attempts are missing > > > Key: YARN-4497 > URL: https://issues.apache.org/jira/browse/YARN-4497 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > > Find following problem when discussing in YARN-3480. > If RM fails to store some attempts in RMStateStore, there will be missing > attempts in RMStateStore, for the case storing attempt1, attempt2 and > attempt3, RM successfully stored attempt1 and attempt3, but failed to store > attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one > by one, for this case, we will recover attmept1, then attempt2. When > recovering attempt2, we call > *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find > its ApplicationAttemptStateData, but it could not find it, an error will come > at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing
[ https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069480#comment-15069480 ] Jun Gong commented on YARN-4497: Yes, it is the problem. > RM might fail to restart when recovering apps whose attempts are missing > > > Key: YARN-4497 > URL: https://issues.apache.org/jira/browse/YARN-4497 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jun Gong >Assignee: Jun Gong > > Find following problem when discussing in YARN-3480. > If RM fails to store some attempts in RMStateStore, there will be missing > attempts in RMStateStore, for the case storing attempt1, attempt2 and > attempt3, RM successfully stored attempt1 and attempt3, but failed to store > attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one > by one, for this case, we will recover attmept1, then attempt2. When > recovering attempt2, we call > *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find > its ApplicationAttemptStateData, but it could not find it, an error will come > at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
[ https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069537#comment-15069537 ] Sunil G commented on YARN-4352: --- Yes.Its related. I ll fix the same.. > Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient > > > Key: YARN-4352 > URL: https://issues.apache.org/jira/browse/YARN-4352 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Sunil G > Labels: security > Attachments: 0001-YARN-4352.patch > > > From > https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt, > we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get > timeout which can be reproduced locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4098) Document ApplicationPriority feature
[ https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069717#comment-15069717 ] Hadoop QA commented on YARN-4098: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 18s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 0m 58s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12779251/0003-YARN-4098.patch | | JIRA Issue | YARN-4098 | | Optional Tests | asflicense mvnsite | | uname | Linux 88545b721bd1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 882f2f0 | | asflicense | https://builds.apache.org/job/PreCommit-YARN-Build/10082/artifact/patchprocess/patch-asflicense-problems.txt | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Max memory used | 29MB | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10082/console | This message was automatically generated. > Document ApplicationPriority feature > > > Key: YARN-4098 > URL: https://issues.apache.org/jira/browse/YARN-4098 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, > 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar > > > This JIRA is to track documentation of application priority and its user, > admin and REST interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4501) Document new put APIs in TimelineClient for ATS 1.5
Junping Du created YARN-4501: Summary: Document new put APIs in TimelineClient for ATS 1.5 Key: YARN-4501 URL: https://issues.apache.org/jira/browse/YARN-4501 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Junping Du Assignee: Xuan Gong In YARN-4234, we are adding new put APIs in TimelineClient, we should document it properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069621#comment-15069621 ] Hudson commented on YARN-4234: -- FAILURE: Integrated in Hadoop-trunk-Commit #9018 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9018/]) YARN-4234. New put APIs in TimelineClient for ats v1.5. Contributed by (junping_du: rev 882f2f04644a13cadb93070d5545f7a4f8691fde) * q * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClientForATS1_5.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineWriter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineEntityGroupId.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/TimelineClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestTimelineEntityGroupId.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/FileSystemTimelineWriter.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/DirectTimelineWriter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-4234-2015-11-13.1.patch, > YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, > YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, > YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, > YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, > YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, > YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, > YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, > YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4098) Document ApplicationPriority feature
[ https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069744#comment-15069744 ] Sunil G commented on YARN-4098: --- Looks good for me. Will wait for [~jianhe] comments also. > Document ApplicationPriority feature > > > Key: YARN-4098 > URL: https://issues.apache.org/jira/browse/YARN-4098 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, > 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar > > > This JIRA is to track documentation of application priority and its user, > admin and REST interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069625#comment-15069625 ] Junping Du commented on YARN-4234: -- Forget to mention, I think we need to document new APIs that we are adding here. Just filed YARN-4501 to track this effort. > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-4234-2015-11-13.1.patch, > YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, > YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, > YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, > YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, > YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, > YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, > YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, > YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4098) Document ApplicationPriority feature
[ https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069702#comment-15069702 ] Hadoop QA commented on YARN-4098: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 21s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 1m 1s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12779248/0002-YARN-4098.patch | | JIRA Issue | YARN-4098 | | Optional Tests | asflicense mvnsite | | uname | Linux 08368cfedd76 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 882f2f0 | | asflicense | https://builds.apache.org/job/PreCommit-YARN-Build/10081/artifact/patchprocess/patch-asflicense-problems.txt | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Max memory used | 30MB | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10081/console | This message was automatically generated. > Document ApplicationPriority feature > > > Key: YARN-4098 > URL: https://issues.apache.org/jira/browse/YARN-4098 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, > 0002-YARN-4098.patch, YARN-4098.rar > > > This JIRA is to track documentation of application priority and its user, > admin and REST interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069629#comment-15069629 ] Junping Du commented on YARN-4265: -- I just commit YARN-4234. [~gtCarrera9], would you rebase your patch on latest trunk branch? Thanks! Hi [~jlowe], I saw your comments above: "This looks like most of the patch is a copy of the entity timeline store from YARN-3942 with a few edits, so I'm sorta reviewing my own code here. As such I did a diff of the patch from this JIRA and the one from YARN-3942 so I could focus on what's changed. I'll defer to others to review the parts that are identical to YARN-3942. Eventually I can see this being a superset of YARN-3942, since it can cache to memory and either cache everything or a subset based on what the plugins decide." Are you OK with continue the review effort going with this patch? Or you have some other preferences? > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.poc_001.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069633#comment-15069633 ] Hadoop QA commented on YARN-4265: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} YARN-4265 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12776931/YARN-4265.YARN-4234.002.patch | | JIRA Issue | YARN-4265 | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/10080/console | This message was automatically generated. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.poc_001.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4098) Document ApplicationPriority feature
[ https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-4098: Attachment: 0003-YARN-4098.patch > Document ApplicationPriority feature > > > Key: YARN-4098 > URL: https://issues.apache.org/jira/browse/YARN-4098 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, > 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar > > > This JIRA is to track documentation of application priority and its user, > admin and REST interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4098) Document ApplicationPriority feature
[ https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069723#comment-15069723 ] Rohith Sharma K S commented on YARN-4098: - bq. -1 asflicense No new files are added, and for the existing modified file has asf header. > Document ApplicationPriority feature > > > Key: YARN-4098 > URL: https://issues.apache.org/jira/browse/YARN-4098 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, > 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar > > > This JIRA is to track documentation of application priority and its user, > admin and REST interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4098) Document ApplicationPriority feature
[ https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069726#comment-15069726 ] Rohith Sharma K S commented on YARN-4098: - [~sunilg]/[~jianhe] kindly review the patch > Document ApplicationPriority feature > > > Key: YARN-4098 > URL: https://issues.apache.org/jira/browse/YARN-4098 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, > 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar > > > This JIRA is to track documentation of application priority and its user, > admin and REST interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4098) Document ApplicationPriority feature
[ https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069691#comment-15069691 ] Rohith Sharma K S commented on YARN-4098: - Updated the patch fixing review comments. > Document ApplicationPriority feature > > > Key: YARN-4098 > URL: https://issues.apache.org/jira/browse/YARN-4098 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, > 0002-YARN-4098.patch, YARN-4098.rar > > > This JIRA is to track documentation of application priority and its user, > admin and REST interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4098) Document ApplicationPriority feature
[ https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-4098: Attachment: 0002-YARN-4098.patch > Document ApplicationPriority feature > > > Key: YARN-4098 > URL: https://issues.apache.org/jira/browse/YARN-4098 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, > 0002-YARN-4098.patch, YARN-4098.rar > > > This JIRA is to track documentation of application priority and its user, > admin and REST interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070006#comment-15070006 ] Sangjin Lee commented on YARN-4224: --- Sorry I am catching up with the discussion. Just to put my opinions on some of the questions raised so far. Regarding omitting some part of the path in the hierarchical form of the URL: bq. Sangjin Lee did you mean providing shortcuts to thing like applications (instead of cluster, user, flow, flowrun, app, we can directly have cluster and app)? Yes, for example, when you query for things like all apps in a flow run, it is possible to omit things like "user" as it can be inferred from the rest of the information. Although the path is /cluster/user/flow/flow-run-id/apps, I was hoping one could do /cluster/flow/flow-run-id/apps and the server will accept it as long as it can infer the missing path from the rest of the context. The UID form would have to specify all parts of the information with no exception however to eliminate any ambiguity. I hope that answers the question. Regarding creating the UID, I think we still need to make a call on whether to make the UID composition a public protocol. If we do, then potentially we don't need to return anything and don't have to worry about in which layer in the server-side it will be composed. On a related note, I'm leaning against making the UID composition configurable. I don't see a whole lot of practical need to customize UID composition, and it will only cause more confusion especially when a user/client deals with multiple clusters. On specifying the entity type along with the entity's UID, I think it would definitely better if not required. My memory is bit hazy on this, but I think there is no hard guarantee that an entity id is unique even within a parent yarn app. Entity id's are essentially up to whoever writes them, and they may choose degenerate id's. I think we always said only the tuple of (entity type, entity id) is unique within an application, right? So, what is the required info for uniquely locating an entity? Entity type, and entity id are needed, but how about the context? App id? Any flow contexts? > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-4479: Attachment: 0002-YARN-4479.patch > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
[ https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4352: -- Attachment: 0002-YARN-4352.patch Attaching an updated patch addressing test fails. > Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient > > > Key: YARN-4352 > URL: https://issues.apache.org/jira/browse/YARN-4352 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Sunil G > Labels: security > Attachments: 0001-YARN-4352.patch, 0002-YARN-4352.patch > > > From > https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt, > we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get > timeout which can be reproduced locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
[ https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070013#comment-15070013 ] Hadoop QA commented on YARN-4352: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 47s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 35s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 59s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 28s {color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 42s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 82m 32s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12779269/0002-YARN-4352.patch | | JIRA Issue | YARN-4352 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 1f5d40077910 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision
[jira] [Commented] (YARN-4400) AsyncDispatcher.waitForDrained should be final
[ https://issues.apache.org/jira/browse/YARN-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069883#comment-15069883 ] Hudson commented on YARN-4400: -- FAILURE: Integrated in Hadoop-trunk-Commit #9019 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9019/]) YARN-4400. AsyncDispatcher.waitForDrained should be final. Contributed (junping_du: rev bb5df272b9c0be9830ee8480cd33e75d26deb9d1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java * hadoop-yarn-project/CHANGES.txt > AsyncDispatcher.waitForDrained should be final > -- > > Key: YARN-4400 > URL: https://issues.apache.org/jira/browse/YARN-4400 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-4400.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070009#comment-15070009 ] Naganarasimha G R commented on YARN-3367: - thanks [~djp] for looking into this issue. bq. Sounds good. I just commit YARN-4400 to trunk. Sorry mislead with wrong jira number actually thought YARN-4457 of [~templedf], can solve the issue, just took a further look, *AsyncDispatcher* has been coded to handle Events only (like AsyncDispatcher's *BlockingQueue eventQueue*, EventHandler's *handle(T event)* ..). Hence its not easy to replace, so probability to reuse for dispatching *Timeline Entities* is bit difficult(/ far too many changes for little re-usability). bq. Does TimelineEntityAsyncDispatcher can be reused by other classes? If not, better to keep it as private class. Though the plan for this class is only to be used by TimelineClientImpl, class is getting cluttered with V1 and V2 code, and impacting readability, hence thought of v2 publishing part of the code in TimelineClientImpl to be moved along with the TimelineEntityAsyncDispatcher, thoughts? > Replace starting a separate thread for post entity with event loop in > TimelineClient > > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-3367-feature-YARN-2928.003.patch, > YARN-3367-feature-YARN-2928.v1.002.patch, > YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch > > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4400) AsyncDispatcher.waitForDrained should be final
[ https://issues.apache.org/jira/browse/YARN-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069849#comment-15069849 ] Junping Du commented on YARN-4400: -- Nice catch, [~templedf]! +1 on the patch, committing it now. > AsyncDispatcher.waitForDrained should be final > -- > > Key: YARN-4400 > URL: https://issues.apache.org/jira/browse/YARN-4400 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Trivial > Attachments: YARN-4400.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069952#comment-15069952 ] Junping Du commented on YARN-3367: -- Thanks Naga for updating the patch! Quickly go through your patch but haven't deep dive there. Quickly response on your comments above: bq. I could reuse/extend Async Dispatcher after YARN-4400 is committed to trunk. Sounds good. I just commit YARN-4400 to trunk. bq. I think it can be more organized if i can move the all this related code(dispatcher code) to a new class. Does TimelineEntityAsyncDispatcher can be reused by other classes? If not, better to keep it as private class. bq. will work on other locations(removing the thread pools in the caller side) once the approach is finalized. Make sense. That could make caller code much simpler. More comments come later. > Replace starting a separate thread for post entity with event loop in > TimelineClient > > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-3367-feature-YARN-2928.003.patch, > YARN-3367-feature-YARN-2928.v1.002.patch, > YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch > > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069951#comment-15069951 ] Varun Saxena commented on YARN-4224: To aid in tonight's discussion, I will jot down the REST endpoints added and points to discuss. [~gtCarrera9], if you have suggestion on these endpoints, you can jot them down here as well. So that we can have a faster discussion during call. * REST endpoints based on UID as per current patch are as under : {panel} *Query multiple flows* : Endpoint is */ws/v2/timeline/flows or /ws/v2/timeline/\{clusterid\}/flows*. This query will return a UID of the form *cluster:user:flowname* for each flow name. *Query multiple flowruns* : Endpoint is */ws/v2/timeline/runs/\{flow UID\}* where flow UID is of the form *cluster:user:flowname* i.e. the one returned in query above. This query returns a UID of the form *cluster:user:flowname:runid* for each flow run. *Query single flowrun* : Endpoint is */ws/v2/timeline/run/\{flowrun UID\}* where flowrun UID is of the form *cluster:user:flowname:runid* i.e. the one returned in query above. This query also returns a UID of the form *cluster:user:flowname:runid* for the flowrun returned. Is this required for Web UI ? *Query multiple apps in a flowrun* : Endpoint is */ws/v2/timeline/runapps/\{flowrun UID\}* where flowrun UID is of the form *cluster:user:flowname:runid*. runapps because we are querying apps within a flowrun. Hierarchical endpoint has one endpoint to query apps within a flow name as well. This query also returns a UID of the form *cluster:user:flowname:runid:appid* for each app returned. *Query single app* : Endpoint is */ws/v2/timeline/app/\{app UID\}* where app UID is of the form *cluster:user:flowname:runid:appid* i.e. the one returned in query above. *Query Entities* : Current endpoint is */ws/v2/timeline/entities/\{entitytype\}/\{app UID\}*. Entity type is separate because we cannot know entity type when we query apps. This was decided to be endpoint when we had decided separator will not be public. Now as it will be public, endpoint can probably be */ws/v2/timeline/entities/\{app UID plus entity type\}* i.e. UID will be *cluster:user:flowname:runid:appid:entitytype*. But for this specific query, client needs to specifically do extra operation on UID returned in previous query, unlike other endpoints. This query also returns a UID of the form *cluster:user:flowname:runid:appid:entitytype:entityid* for each entity returned. *Query Entity* : Endpoint is */ws/v2/timeline/entity/\{entity UID\}* where entity UID is of the form *cluster:user:flowname:runid:appid:entitytype:entityid* {panel} * Need to discuss pros and cons of filling UID inside storage layer and outside it. We can add an endpoint for single flow once offline aggregation is done. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3976) Catch ApplicationNotFoundException instead of parent YarnException in YarnClient and AppReportFetcher
[ https://issues.apache.org/jira/browse/YARN-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3976: -- Fix Version/s: (was: 2.7.2) > Catch ApplicationNotFoundException instead of parent YarnException in > YarnClient and AppReportFetcher > - > > Key: YARN-3976 > URL: https://issues.apache.org/jira/browse/YARN-3976 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai >Priority: Trivial > > It's is better to catch the ApplicationNotFoundException rather than the > parent YarnException and rethrow it when it's not ApplicationNotFoundExcepton > {noformat} > catch (YarnException e) { > if (!historyServiceEnabled) { > // Just throw it as usual if historyService is not enabled. > throw e; > } > // Even if history-service is enabled, treat all exceptions still the > same > // except the following > if (!(e.getClass() == ApplicationNotFoundException.class)) { > throw e; > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070186#comment-15070186 ] Wangda Tan commented on YARN-3870: -- Hi [~grey], Thanks for raising this, we definitely need such mechanism to better describe our resource request. [~asuresh], I'm not sure how the unique id works? Are you planing to add it as a key to AppSchedulingInfo resource requests map? (e.g. {{Map =>>}}) > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070229#comment-15070229 ] Li Lu commented on YARN-4224: - Actually I think the /ws/v2/timeline/apps/{app UID}/entities?entityType=... format looks fine. On querying entities, entity types is a query parameter but may not be mandatory. /ws/v2/timeline/apps/{app UID}/entities semantically should list all entities in one application. Implementation-wise, this may not be a good idea since there may be too many entities. There are solutions to this problem. For example, we can restrict /ws/v2/timeline/apps/{app UID}/entities will always return first 100 entities. With this design, if users would like to list all CONTAINER type entities, they can add entityType as one query parameter. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2882) Add ExecutionType to denote if a container execution is GUARANTEED or OPPORTUNISTIC
[ https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070329#comment-15070329 ] Wangda Tan commented on YARN-2882: -- Hi [~asuresh], Thanks for answering my question , but I still may not understand correctly: - If opportunistic/guaranteed is solely decided by scheduler, is it possible that AM cannot get container in predictable behavior? For example, LRS container will be guaranteed only. Another example is MR job want to speculative tasks to be opportunistic only. - Why add limitation to AMs that can only allocate for opportunistic resources. (your 2nd point). > Add ExecutionType to denote if a container execution is GUARANTEED or > OPPORTUNISTIC > --- > > Key: YARN-2882 > URL: https://issues.apache.org/jira/browse/YARN-2882 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-2882-yarn-2877.001.patch, > YARN-2882-yarn-2877.002.patch, YARN-2882-yarn-2877.003.patch, > YARN-2882-yarn-2877.004.patch, yarn-2882.patch > > > This JIRA introduces the notion of container types. > We propose two initial types of containers: guaranteed-start and queueable > containers. > Guaranteed-start are the existing containers, which are allocated by the > central RM and are instantaneously started, once allocated. > Queueable is a new type of container, which allows containers to be queued in > the NM, thus their execution may be arbitrarily delayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070521#comment-15070521 ] Naganarasimha G R commented on YARN-4479: - Hi [~rohithsharma], Thanks for the patch, New approach seems to be better than the older as it tries to avoid additional data structure used for the same purpose, but few points : * If we consider for FairOrderingPolicy it first considers {{FairComparator}} and then the {{FifoComparator}}, so only if fairness is equal it will consider whether the application was already running, so would it be better to add additional comparator for recovery which can be used by both Fair and Fifo ? * So it will be totally left to Ordering policy whether to consider the order of the recovered app based on submission time or not, so better to get that documented so that custom ordering policy can consider it. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4098) Document ApplicationPriority feature
[ https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070336#comment-15070336 ] Jian He commented on YARN-4098: --- looks good overall - this feature allows applications to be submitted and scheduled with different priorities. may be {{this feature allows applications to be submitted and scheduled with different priorities.}} - which is greater then - should be greater than > Document ApplicationPriority feature > > > Key: YARN-4098 > URL: https://issues.apache.org/jira/browse/YARN-4098 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, > 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar > > > This JIRA is to track documentation of application priority and its user, > admin and REST interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070353#comment-15070353 ] Hadoop QA commented on YARN-4265: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 8 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 45s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 32s {color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server no findbugs output file (hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/target/findbugsXml.xml) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 45s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 21s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s {color} | {color:red} Patch generated 54 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 292, now 345). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 42s {color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server no findbugs output file (hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/target/findbugsXml.xml) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 46s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 12s {color} | {color:red} hadoop-yarn-server in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s {color} | {color:green} hadoop-yarn-common
[jira] [Commented] (YARN-4156) TestAMRestart#testAMBlacklistPreventsRestartOnSameNode assumes CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070387#comment-15070387 ] Karthik Kambatla commented on YARN-4156: +1 > TestAMRestart#testAMBlacklistPreventsRestartOnSameNode assumes > CapacityScheduler > > > Key: YARN-4156 > URL: https://issues.apache.org/jira/browse/YARN-4156 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-4156.001.patch > > > The assumes the scheduler is CapacityScheduler without configuring it as > such. This causes it to fail if the default is something else such as the > FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070495#comment-15070495 ] Masatake Iwasaki commented on YARN-4234: [~djp], file named "q" seemed to be accidentally added to top directory. I'm adding addendum patch to remove the file. > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-4234-2015-11-13.1.patch, > YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, > YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, > YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, > YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, > YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, > YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, > YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, > YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070534#comment-15070534 ] MENG DING commented on YARN-4138: - Hi [~jianhe], which file(s) are you referring to in particular? > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4156) TestAMRestart#testAMBlacklistPreventsRestartOnSameNode assumes CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070492#comment-15070492 ] Hudson commented on YARN-4156: -- FAILURE: Integrated in Hadoop-trunk-Commit #9021 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9021/]) YARN-4156. TestAMRestart#testAMBlacklistPreventsRestartOnSameNode (kasha: rev 0af492b4bdb0356ea04e13690b78a236b82bd40c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java > TestAMRestart#testAMBlacklistPreventsRestartOnSameNode assumes > CapacityScheduler > > > Key: YARN-4156 > URL: https://issues.apache.org/jira/browse/YARN-4156 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.9.0 > > Attachments: YARN-4156.001.patch > > > The assumes the scheduler is CapacityScheduler without configuring it as > such. This causes it to fail if the default is something else such as the > FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070575#comment-15070575 ] Sangjin Lee commented on YARN-4224: --- Regarding the ambiguity between /ws/v2/timeline/apps/\{app UID\}/entities/\{entitytype\} (UID) and /ws/v2/timeline/apps/app_id/entities/entitytype (hierachical), doesn't the hierarchical URL need more context such as cluster/user/flow/flow-run? Is it because all of them can be omitted? At any rate, I agree that due to the possibility of omission ambiguities are perhaps possible. In that case, I suspect using different query nouns might be the ultimate solution (e.g. "apps" for the hierachical and "apps-uid" for UIDs). > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4353) Provide short circuit user group mapping for NM/AM
[ https://issues.apache.org/jira/browse/YARN-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070358#comment-15070358 ] Karthik Kambatla commented on YARN-4353: bq. If secure LDAP is configured for group mapping, then there are some additional complications created by the unnecessary group resolution. Could you elaborate? What complications? I would think Vinod's suggestion here should work, albeit a more substantial change. Could you also comment on how the change here helps/hurts the long-term overall fix? > Provide short circuit user group mapping for NM/AM > -- > > Key: YARN-4353 > URL: https://issues.apache.org/jira/browse/YARN-4353 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4353.prelim.patch > > > When the NM launches an AM, the {{ContainerLocalizer}} gets the current user > from {{UserGroupInformation}}, which triggers user group mapping, even though > the user groups are never accessed. If secure LDAP is configured for group > mapping, then there are some additional complications created by the > unnecessary group resolution. Additionally, it adds unnecessary latency to > the container launch time. > To address the issue, before getting the current user, the > {{ContainerLocalizer}} should configure {{UserGroupInformation}} with a null > group mapping service that quickly and quietly returns an empty group list > for all users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4503) Allow for a pluggable policy to decide if a ResourceRequest is GUARANTEED or not
Arun Suresh created YARN-4503: - Summary: Allow for a pluggable policy to decide if a ResourceRequest is GUARANTEED or not Key: YARN-4503 URL: https://issues.apache.org/jira/browse/YARN-4503 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun Suresh Assignee: Arun Suresh As per discussions on the YARN-2882 thread, specifically [this comment|https://issues.apache.org/jira/browse/YARN-2882?focusedCommentId=15065547=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15065547], we would require a pluggable policy that can decide if a ResourceRequest is GUARANTEED or OPPORTUNISTIC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070337#comment-15070337 ] Gera Shegalov commented on YARN-2934: - Hi [~Naganarasimha]. Thanks for updating the patch. Things we have not addressed from my previous comments is capping the buffer size. But I now think it's good enough because we have a good small default for the tail NM_CONTAINER_STDERR_BYTES. Still please rename: {code} - FileStatus[] listStatus = fileSystem + FileStatus[] errorStatuses = fileSystem {code} or similar. It's an array of statuses and not status of a list Let us have a space after ',' and a new line in: {code} - .append(StringUtils.arrayToString(errorFileNames)).append(". "); + .append(StringUtils.join(", ", errorFileNames)).append(".\n"); {code} Fix the test code accordingly method verifyTailErrorLogOnContainerExit can/should be private. Same for ContainerExitHandler class. Assume.assumeTrue(Shell.LINUX); should be Assume.assumeFalse(Shell.WINDOWS || Shell.OTHER); but actually why do we need this? The test seems to be platform-independent. Assert.assertNotNull(exitEvent.getDiagnosticInfo()); seems redundant because you then have other asserts implying this already. I suggest to LOG.info the diagnostics instead to make the test log more useful. > Improve handling of container's stderr > --- > > Key: YARN-2934 > URL: https://issues.apache.org/jira/browse/YARN-2934 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gera Shegalov >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, > YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, > YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, > YARN-2934.v2.001.patch, YARN-2934.v2.002.patch, YARN-2934.v2.003.patch > > > Most YARN applications redirect stderr to some file. That's why when > container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2882) Add ExecutionType to denote if a container execution is GUARANTEED or OPPORTUNISTIC
[ https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070360#comment-15070360 ] Arun Suresh commented on YARN-2882: --- Hey [~leftnoteasy], bq. If opportunistic/guaranteed is solely decided by scheduler.. So, it neednt be decided solely by the Scheduler. Taking into consideration YARN-1011, if oversubscription is required, yes, this will be decided by the Scheduler, else if the NM is configured to support Distributed Scheduling, this decision can be made by the LocalScheduler, or via the application of a Policy (just created YARN-4503 to track this) bq. is it possible that AM cannot get container in predictable behavior? If the above mentioned policy makes the decision on static parameters such as locality or container size etc, yes, it should be consistent.. if it is more dynamic, for eg. based on load etc, then not so much... but we feel AM should not need to know. bq. Why add limitation to AMs that can only allocate for opportunistic resources. Apologize if I wasnt clear. what I meant was this : If an AM is not able to specifying the type of resource request.. then we can also ensure that mis-behaving AMs wont flood the scheduler with only GUARANTEED requests. > Add ExecutionType to denote if a container execution is GUARANTEED or > OPPORTUNISTIC > --- > > Key: YARN-2882 > URL: https://issues.apache.org/jira/browse/YARN-2882 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-2882-yarn-2877.001.patch, > YARN-2882-yarn-2877.002.patch, YARN-2882-yarn-2877.003.patch, > YARN-2882-yarn-2877.004.patch, yarn-2882.patch > > > This JIRA introduces the notion of container types. > We propose two initial types of containers: guaranteed-start and queueable > containers. > Guaranteed-start are the existing containers, which are allocated by the > central RM and are instantaneously started, once allocated. > Queueable is a new type of container, which allows containers to be queued in > the NM, thus their execution may be arbitrarily delayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-4234: --- Attachment: YARN-4234.addendum.patch > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-4234-2015-11-13.1.patch, > YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, > YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, > YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, > YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, > YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, > YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, > YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, > YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch, > YARN-4234.addendum.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4098) Document ApplicationPriority feature
[ https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070526#comment-15070526 ] Rohith Sharma K S commented on YARN-4098: - bq. may be this feature allows applications to be submitted and scheduled with different priorities. I did not get what is the change to be done, I see both are sentences are same. I think something missed. > Document ApplicationPriority feature > > > Key: YARN-4098 > URL: https://issues.apache.org/jira/browse/YARN-4098 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, > 0002-YARN-4098.patch, 0003-YARN-4098.patch, YARN-4098.rar > > > This JIRA is to track documentation of application priority and its user, > admin and REST interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4462) FairScheduler: Disallow preemption from a queue
[ https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070557#comment-15070557 ] Hadoop QA commented on YARN-4462: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} Patch generated 6 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 76, now 79). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 43s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 29s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 35s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 154m 21s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070556#comment-15070556 ] Rohith Sharma K S commented on YARN-4479: - I had 2 options in doing this in fifoordering policy. I took simpler approach to make working patch. Further improvements like this will/can be addressed in coming patches once initial approach is agreed upon. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4496) Improve HA ResourceManager Failover detection on the client
[ https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070140#comment-15070140 ] Subru Krishnan commented on YARN-4496: -- +1. Thanks [~asuresh] for initiating this. To add more context, our deployments are fairly large with multiple secondaries which is resulting in considerable connection latencies based on the current failover proxy. > Improve HA ResourceManager Failover detection on the client > --- > > Key: YARN-4496 > URL: https://issues.apache.org/jira/browse/YARN-4496 > Project: Hadoop YARN > Issue Type: Improvement > Components: client, resourcemanager >Reporter: Arun Suresh >Assignee: Arun Suresh > > HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to > improve Namenode failover detection in the client. It does this by > concurrently trying all namenodes and picks the namenode that returns the > fastest with a successful response as the active node. > It would be useful to have a similar ProxyProvider for the Yarn RM (it can > possibly be done by converging some the class hierarchies to use the same > ProxyProvider) > This would especially be useful for large YARN deployments with multiple > standby RMs where clients will be able to pick the active RM without having > to traverse a list of configured RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070178#comment-15070178 ] Subru Krishnan commented on YARN-3870: -- +1 on this. Thanks [~grey] for raising this. I have been having offline discussions with [~asuresh] and [~curino] around Distributed Scheduling (YARN-2877) and Federation (YARN-2915). In both scenarios, sending the raw container request and letting the RM expand will save us a lot of pain as currently we are finding it very difficult to route requests correctly in the AMRMProxy (YARN-2844) > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070212#comment-15070212 ] Varun Saxena commented on YARN-4224: Thanks [~leftnoteasy]. For entities endpoint would /ws/v2/timeline/apps/\{app UID\}/\{entitytype\} be fine for UI ? This would be a slight deviation from other endpoints because entity type cannot be put as part of UID in previous(parent) response For querying app attempts entity type will be YARN_APP_ATTEMPT and for containers it will be YARN_CONTAINER i.e. endpoints will basically be /ws/v2/timeline/apps/\{app UID\}/YARN_APP_ATTEMPT and /ws/v2/timeline/apps/\{app UID\}/YARN_CONTAINER respectively. I dont think in UI we will be displaying all possible generic entity types. Only app attempts and containers will be required. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070218#comment-15070218 ] Varun Saxena commented on YARN-4224: Another option would be to make entities endpoint as {{/ws/v2/timeline/apps/\{app UID\}/entities?entityType=...}}. However this will be a mandatory param(there will be check at server side). Pls note that hierarchical REST endpoint has been kept as {{/ws/v2/timeline/apps/\{appid\}/entities/\{entitytype\}}}. Pls note app UID and app id are not the same thing. We need some differentiation between UID endpoint and hierarchical endpoint because if we follow general scheme the endpoints will clash. Although mandatory params in REST are part of path param generally but I guess we have no other option here. For UID, we can put entity type as query param and hierarchical endpoint a path param. Its confusing anyways. Or should we have endpoints like {{/ws/v2/timeline/runsUID/\{run UID\}/apps}}, {{/ws/v2/timeline/appsUID/\{app UID\}}}, {{/ws/v2/timeline/appsUID/\{app UID\}/entities/\{entitytype\}}}, thereby clearly indicating that UID is being passed and avoiding conflict as mentioned above. Thoughts ? > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070243#comment-15070243 ] Li Lu commented on YARN-4224: - Yes, I think this is fine for entities. The root cause of this is that entities need both id and type to be uniquely identified. For UID based queries we can pass type as a query parameter. For the hierarchical endpoints, type is modeled as a part of entity ids (we have to do this to uniquely id an entity). The clash will happen if we hit the .../apps endpoint, and we have to distinguish those two cases. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070302#comment-15070302 ] Wangda Tan commented on YARN-4224: -- Hi [~varun_saxena], [~gtCarrera], bq. Currently query without entity type is not supported I feel that we should split API-design and internal implementation, it is quite possible that web UI wants to make a single RPC call, pull more rich application entities (aka, all entities in one app), and render charts locally. It's fine if the currently implementation doesn't support it, we can return bad response if we cannot support now. But it will be important to make a extensible REST API that we can support it in the future without semantics change. Thoughts? > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070302#comment-15070302 ] Wangda Tan edited comment on YARN-4224 at 12/23/15 11:13 PM: - Hi [~varun_saxena], [~gtCarrera], bq. Currently query without entity type is not supported I feel that we should split API-design and internal implementation, it is quite possible that web UI wants to make a single REST call, pull more rich application entities (aka, all entities in one app), and render charts locally. It's fine if the currently implementation doesn't support it, we can return bad response if we cannot support now. But it will be important to make a extensible REST API that we can support it in the future without semantics change. Thoughts? was (Author: leftnoteasy): Hi [~varun_saxena], [~gtCarrera], bq. Currently query without entity type is not supported I feel that we should split API-design and internal implementation, it is quite possible that web UI wants to make a single RPC call, pull more rich application entities (aka, all entities in one app), and render charts locally. It's fine if the currently implementation doesn't support it, we can return bad response if we cannot support now. But it will be important to make a extensible REST API that we can support it in the future without semantics change. Thoughts? > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070141#comment-15070141 ] Varun Saxena commented on YARN-4224: Sorry for entities, we cannot really have an endpoint as /ws/v2/timeline/apps/\{app UID\}/entities/\{entitytype\} because this will clash with hierarchical endpoint for entities. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: YARN-4265-trunk.001.patch Thanks [~djp]! I just rebased my patch to the latest trunk. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, > YARN-4265-trunk.poc_001.patch, YARN-4265.YARN-4234.001.patch, > YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4265: Attachment: (was: YARN-4265-trunk.poc_001.patch) > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.001.patch, > YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070198#comment-15070198 ] Wangda Tan commented on YARN-4224: -- Thanks [~varun_saxena], Synced with [~gtCarrera] about this, I think it's fine to me to have two hierarchy ({{.timeline/\{parent\}/childrens}} to locate entities such as apps within a flow, flowruns within a flow. I don' have strong opinion between the two-hierarchy API OR adding parent-id to query parameter ({{timeline/apps/flowrun=\{flowrun_uid\}}}. The most important things to me for the REST API is allowing client locate single object at one hierarchy (such as {{timeline/flowruns/\{flowrun_uid\}}}. I think we're on the same page for this. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070198#comment-15070198 ] Wangda Tan edited comment on YARN-4224 at 12/23/15 9:11 PM: Thanks [~varun_saxena], Synced with [~gtCarrera] about this, I think it's fine to me to have two hierarchy ({{.timeline/\{parent\}/childrens}} to locate entities such as apps within a flow, flowruns within a flow. I don' have strong opinion between the two-hierarchy API OR adding parent-id to query parameter ({{timeline/apps/?flowrun=\{flowrun_uid\}}}. The most important things to me for the REST API is allowing client locate single object at one hierarchy (such as {{timeline/flowruns/\{flowrun_uid\}}}. I think we're on the same page for this. was (Author: leftnoteasy): Thanks [~varun_saxena], Synced with [~gtCarrera] about this, I think it's fine to me to have two hierarchy ({{.timeline/\{parent\}/childrens}} to locate entities such as apps within a flow, flowruns within a flow. I don' have strong opinion between the two-hierarchy API OR adding parent-id to query parameter ({{timeline/apps/flowrun=\{flowrun_uid\}}}. The most important things to me for the REST API is allowing client locate single object at one hierarchy (such as {{timeline/flowruns/\{flowrun_uid\}}}. I think we're on the same page for this. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070232#comment-15070232 ] Varun Saxena commented on YARN-4224: Well for hierarchical endpoint, we have something like {{/ws/v2/timeline/apps/\{appid}/entities/\{entityType\} as endpoint. Shouldnt they be consistent ? If they are consistent, they will clash. Maybe for UID, we can go with query param for entity types because UID endpoint will primarily be called from UI and entity type always supplied. Default limit for number of entities is 100. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling
[ https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070279#comment-15070279 ] Arun Suresh commented on YARN-3870: --- [~leftnoteasy], With respect to the AM, I was thinking.. just having it as a field in the ReseourceRequest as well as the Container (returned by the allocate call) would suffice. >From the perspective of the Scheduler, yes, {{Map =>>}} was the direction I was thinking.. Correct me if I >am wrong, but, currently, there is an implicit understanding that all >resources requests for the same resource requirement should have the same >priority. Having an explicit request id would allow us to remove that >constraint as well.. > Providing raw container request information for fine scheduling > --- > > Key: YARN-3870 > URL: https://issues.apache.org/jira/browse/YARN-3870 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, applications, capacityscheduler, fairscheduler, > resourcemanager, scheduler, yarn >Reporter: Lei Guo > > Currently, when AM sends container requests to RM and scheduler, it expands > individual container requests into host/rack/any format. For instance, if I > am asking for container request with preference "host1, host2, host3", > assuming all are in the same rack rack1, instead of sending one raw container > request to RM/Scheduler with raw preference list, it basically expand it to > become 5 different objects with host1, host2, host3, rack1 and any in there. > When scheduler receives information, it basically already lost the raw > request. This is ok for single container request, but it will cause trouble > when dealing with multiple container requests from the same application. > Consider this case: > 6 hosts, two racks: > rack1 (host1, host2, host3) rack2 (host4, host5, host6) > When application requests two containers with different data locality > preference: > c1: host1, host2, host4 > c2: host2, host3, host5 > This will end up with following container request list when client sending > request to RM/Scheduler: > host1: 1 instance > host2: 2 instances > host3: 1 instance > host4: 1 instance > host5: 1 instance > rack1: 2 instances > rack2: 2 instances > any: 2 instances > Fundamentally, it is hard for scheduler to make a right judgement without > knowing the raw container request. The situation will get worse when dealing > with affinity and anti-affinity or even gang scheduling etc. > We need some way to provide raw container request information for fine > scheduling purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4502) Sometimes Two AM containers get launched
[ https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4502: - Labels: 2.6.4-candidate (was: ) > Sometimes Two AM containers get launched > > > Key: YARN-4502 > URL: https://issues.apache.org/jira/browse/YARN-4502 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Wangda Tan >Priority: Critical > Labels: 2.6.4-candidate > > Scenario : > * set yarn.resourcemanager.am.max-attempts = 2 > * start dshell application > {code} > yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar > hadoop-yarn-applications-distributedshell-*.jar > -attempt_failures_validity_interval 6 -shell_command "sleep 150" > -num_containers 16 > {code} > * Kill AM pid > * Print container list for 2nd attempt > {code} > yarn container -list appattempt_1450825622869_0001_02 > INFO impl.TimelineClientImpl: Timeline service address: > http://xxx:port/ws/v1/timeline/ > INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10: > Total number of containers :2 > Container-Id Start Time Finish Time > StateHost Node Http Address >LOG-URL > container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa > container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa > {code} > * look for new AM pid > Here, 2nd AM container was suppose to be started on > container_e12_1450825622869_0001_02_01. But AM was not launched on > container_e12_1450825622869_0001_02_01. It was in AQUIRED state. > On other hand, container_e12_1450825622869_0001_02_02 got the AM running. > Expected behavior: RM should not start 2 containers for starting AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4502) Sometimes Two AM containers get launched
[ https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4502: - Labels: (was: 2.6.4-candidate) > Sometimes Two AM containers get launched > > > Key: YARN-4502 > URL: https://issues.apache.org/jira/browse/YARN-4502 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Wangda Tan >Priority: Critical > > Scenario : > * set yarn.resourcemanager.am.max-attempts = 2 > * start dshell application > {code} > yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar > hadoop-yarn-applications-distributedshell-*.jar > -attempt_failures_validity_interval 6 -shell_command "sleep 150" > -num_containers 16 > {code} > * Kill AM pid > * Print container list for 2nd attempt > {code} > yarn container -list appattempt_1450825622869_0001_02 > INFO impl.TimelineClientImpl: Timeline service address: > http://xxx:port/ws/v1/timeline/ > INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10: > Total number of containers :2 > Container-Id Start Time Finish Time > StateHost Node Http Address >LOG-URL > container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa > container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa > {code} > * look for new AM pid > Here, 2nd AM container was suppose to be started on > container_e12_1450825622869_0001_02_01. But AM was not launched on > container_e12_1450825622869_0001_02_01. It was in AQUIRED state. > On other hand, container_e12_1450825622869_0001_02_02 got the AM running. > Expected behavior: RM should not start 2 containers for starting AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4083) Add a discovery mechanism for the scheduler address
[ https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070131#comment-15070131 ] Arun Suresh commented on YARN-4083: --- Hoping to get some consensus on this.. since it is required for YARN-2877 as well. I feel having the ContainerExecutor expose a *YARN_SCHEDULER_ADDRESS* environment variable (and as [~jianhe] mentioned, maybe let it be a list, with the first entry being the local NM and the remaining a list of RM addresses to allow for failover) should work across across Java and non-java applications. It would also be somewhat dynamic, as [~steve_l] mentioned, since the value is decided by the NM right before it launches a container, but unlike a global/ZK based registry, it can be different for different containers / applications (although it would not change during the lifetime of the container). bq. how do AM IP filters know when to bounce an HTTP Request over to the proxy My understanding (atleast our requirement for YARN-2877) is that this would be used by the AM specifically for resolving the address for the server end of the ApplicationMasterProtocol, so HTTP addresses can be specified probably via another env variable maybe ? bq. How does this work when the container is actually a Linux container and not a fake yarn-level container ? [~aw], apologize if I did not fully understand, but I feel an environment variable should be accessible by linux, windows and other containers. Thoughts ? > Add a discovery mechanism for the scheduler address > --- > > Key: YARN-4083 > URL: https://issues.apache.org/jira/browse/YARN-4083 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > > Today many apps like Distributed Shell, REEF, etc rely on the fact that the > HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler > address. This JIRA proposes the addition of an explicit discovery mechanism > for the scheduler address -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070133#comment-15070133 ] Varun Saxena commented on YARN-4224: Based on tonight's discussion, UID endpoints can look as under : {panel} *Query multiple flows* : Endpoint is */ws/v2/timeline/flows or /ws/v2/timeline/\{clusterid\}/flows*. This query will return a UID of the form *cluster:user:flowname* for each flow name. *Query multiple flowruns* : Endpoint is */ws/v2/timeline/flows/\{flow UID\}/runs* where flow UID is of the form *cluster:user:flowname* i.e. the one returned in query above. This query returns a UID of the form *cluster:user:flowname:runid* for each flow run. *Query single flowrun* : Endpoint is */ws/v2/timeline/runs/\{flowrun UID\}* where flowrun UID is of the form *cluster:user:flowname:runid* i.e. the one returned in query above. This query also returns a UID of the form *cluster:user:flowname:runid* for the flowrun returned. *Query multiple apps in a flowrun* : Endpoint can be */ws/v2/timeline/runs/\{flowrun UID\}/apps* where flowrun UID is of the form *cluster:user:flowname:runid*. This query also returns a UID of the form *cluster:user:flowname:runid:appid* for each app ret urned. *Query single app* : Endpoint can be */ws/v2/timeline/apps/\{app UID\}* where app UID is of the form *cluster:user:flowname:runid:appid* i.e. the one returned in query above. *Query Entities* : Endpoint can be */ws/v2/timeline/apps/\{app UID\}/entities/\{entitytype\}* or */ws/v2/timeline/apps/\{app UID\}/\{entitytype\}*. Thoughts ? Entity type is separate because we cannot know entity type when we query apps.This query also returns a UID of the form *cluster:user:flowname:runid:appid:entitytype:entityid* for each entity returned. *Query Entity* : Endpoint can be */ws/v2/timeline/entities/\{entity UID\}* where entity UID is of the form *cluster:user:flowname:runid:appid:entitytype:entityid* {panel} * One more question we need to discuss is whether UID is really important to be sent from timeline reader ? Or client can construct it. Basically can Ember construct it ? Please note that things like users, flows, etc. i.e. flow context information will not be available in app query or entity query response. So Ember cannot easily fetch it from REST response. Or would it be easier for Ember if UID came in response. If UID has to come in response, we can probably elevate it to TimelineEntity as an extra field. Also as discussed, construction of UID can be done in Timeline Reader Manager instead of storage layer. cc [~sjlee0], [~gtCarrera9], [~leftnoteasy] Lets reach a consensus and conclude this before holidays. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4424) Fix deadlock in RMAppImpl
[ https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070180#comment-15070180 ] Vinod Kumar Vavilapalli commented on YARN-4424: --- This originally never made it to branch-2.7.2 even though the fix version is set so. Tx to [~djp] for catching this. I just cherry-picked it for rolling a new RC for 2.7.2. FYI. > Fix deadlock in RMAppImpl > - > > Key: YARN-4424 > URL: https://issues.apache.org/jira/browse/YARN-4424 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Jian He >Priority: Blocker > Fix For: 2.7.2, 2.6.3 > > Attachments: YARN-4424.1.patch > > > {code} > yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn > application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING > 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: > http://XXX:8188/ws/v1/timeline/ > 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at > XXX/0.0.0.0:8050 > 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History > server at XXX/0.0.0.0:10200 > {code} > {code:title=RM log} > 2015-12-04 21:59:19,744 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000 > 2015-12-04 22:00:50,945 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000 > 2015-12-04 22:02:22,416 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000 > 2015-12-04 22:03:53,593 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 24 > 2015-12-04 22:05:24,856 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000 > 2015-12-04 22:06:56,235 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000 > 2015-12-04 22:08:27,510 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000 > 2015-12-04 22:09:58,786 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070246#comment-15070246 ] Li Lu commented on YARN-4224: - After all these discussion, I think it will be helpful to come up with a write up for our REST API designs. We can post the write up here so that it's much simpler to have a big picture of our reader REST APIs? I can certainly help on this. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4502) Sometimes Two AM containers get launched
[ https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4502: - Target Version/s: 2.6.4 > Sometimes Two AM containers get launched > > > Key: YARN-4502 > URL: https://issues.apache.org/jira/browse/YARN-4502 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Wangda Tan >Priority: Critical > > Scenario : > * set yarn.resourcemanager.am.max-attempts = 2 > * start dshell application > {code} > yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar > hadoop-yarn-applications-distributedshell-*.jar > -attempt_failures_validity_interval 6 -shell_command "sleep 150" > -num_containers 16 > {code} > * Kill AM pid > * Print container list for 2nd attempt > {code} > yarn container -list appattempt_1450825622869_0001_02 > INFO impl.TimelineClientImpl: Timeline service address: > http://xxx:port/ws/v1/timeline/ > INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10: > Total number of containers :2 > Container-Id Start Time Finish Time > StateHost Node Http Address >LOG-URL > container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa > container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa > {code} > * look for new AM pid > Here, 2nd AM container was suppose to be started on > container_e12_1450825622869_0001_02_01. But AM was not launched on > container_e12_1450825622869_0001_02_01. It was in AQUIRED state. > On other hand, container_e12_1450825622869_0001_02_02 got the AM running. > Expected behavior: RM should not start 2 containers for starting AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM
[ https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070185#comment-15070185 ] Vrushali C commented on YARN-3995: -- Hi [~Naganarasimha] Thanks for the thoughts on the jira. I was wondering if the following is a feasible solution: - can the NM container maintain a list/map info of “zombie app ids” for AMs/collectors that it is removing? That way when metrics arrive at the NM from other NMs for those zombie app ids, it can see if this was for an app that previously had a collector and hence most likely still a valid metric/entity and then somehow write that to the backend, perhaps via a “common parent collector” process or something. - we can have the NM periodically prune this zombie list, perhaps say a few days after app completion, remove the info for that app from the zombie app list. I am not too knowledgeable about the NM and so not sure if this is complicated/infeasible. > Some of the NM events are not getting published due race condition when AM > container finishes in NM > > > Key: YARN-3995 > URL: https://issues.apache.org/jira/browse/YARN-3995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > As discussed in YARN-3045: While testing in TestDistributedShell found out > that few of the container metrics events were failing as there will be race > condition. When the AM container finishes and removes the collector for the > app, still there is possibility that all the events published for the app by > the current NM and other NM are still in pipeline, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4502) Sometimes Two AM containers get launched
[ https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-4502: Assignee: Wangda Tan > Sometimes Two AM containers get launched > > > Key: YARN-4502 > URL: https://issues.apache.org/jira/browse/YARN-4502 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Wangda Tan >Priority: Critical > > Scenario : > * set yarn.resourcemanager.am.max-attempts = 2 > * start dshell application > {code} > yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar > hadoop-yarn-applications-distributedshell-*.jar > -attempt_failures_validity_interval 6 -shell_command "sleep 150" > -num_containers 16 > {code} > * Kill AM pid > * Print container list for 2nd attempt > {code} > yarn container -list appattempt_1450825622869_0001_02 > INFO impl.TimelineClientImpl: Timeline service address: > http://xxx:port/ws/v1/timeline/ > INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10: > Total number of containers :2 > Container-Id Start Time Finish Time > StateHost Node Http Address >LOG-URL > container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa > container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa > {code} > * look for new AM pid > Here, 2nd AM container was suppose to be started on > container_e12_1450825622869_0001_02_01. But AM was not launched on > container_e12_1450825622869_0001_02_01. It was in AQUIRED state. > On other hand, container_e12_1450825622869_0001_02_02 got the AM running. > Expected behavior: RM should not start 2 containers for starting AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070306#comment-15070306 ] Li Lu commented on YARN-4224: - Thanks [~leftnoteasy]! I agree that we should separate the semantics and implementations. Our web UI, as one user of the REST API, does not really need general queries for timeline entities (I can always attach an entity type if needed). However, as from the API design perspective, I'd hope our API to be general enough. Having APIs like "list all entities within one application" may seems too ambitious for implementations, but something like "on this end point I assume you want all entities for this application, but to avoid crash myself I'm only returning a part of it" looks fine. However, enforcing an entity type to all such queries and add them as part of the end point looks a little bit suboptimal (it also changes the way we organize resources). > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070221#comment-15070221 ] Varun Saxena commented on YARN-4224: Correction - "For UID, we can put entity type as query param and for hierarchical endpoint put entity type a path param." But thats not consistent. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070242#comment-15070242 ] Hadoop QA commented on YARN-4479: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} Patch generated 10 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 373, now 379). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 3 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 10s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 2, now 3). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 54s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 50s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 152m 28s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests |
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070240#comment-15070240 ] Varun Saxena commented on YARN-4224: Pls note this is specific to entities endpoint only. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070248#comment-15070248 ] Varun Saxena commented on YARN-4224: Yes, we can have a writeup. This will be useful during eventual documentation as well. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070237#comment-15070237 ] Varun Saxena commented on YARN-4224: Or we can let it clash. If string has decided delimiters, we consider it as UID, otherwise app id. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4343) Need to support Application History Server on ATSV2
[ https://issues.apache.org/jira/browse/YARN-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070239#comment-15070239 ] Vrushali C commented on YARN-4343: -- Hi [~Naganarasimha] Towards the end of today's call, you had mentioned this jira id as one of the jiras you wanted some feedback on. I think we discussed this in today's call, more or less? I looked through the previous comments and wanted to say that when you get the chance, do layout your proposal so that we can review this further. thanks Vrushali > Need to support Application History Server on ATSV2 > --- > > Key: YARN-4343 > URL: https://issues.apache.org/jira/browse/YARN-4343 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > > AHS is used by the CLI and Webproxy(REST), if the application related > information is not found in RM then it tries to fetch from AHS and show -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070256#comment-15070256 ] Varun Saxena commented on YARN-4224: bq. For UID based queries we can pass type as a query parameter. For the hierarchical endpoints, type is modeled as a part of entity ids (we have to do this to uniquely id an entity). IIUC, you mean that we can have endpoint as /ws/v2/timeline/apps/\{app UID\}/entities?entityType=... for UID and, endpoint as /ws/v2/timeline/apps/\{app UID\}/entities/\{entitytype\} for hierarchical REST URL. Lets reach an agreement on this then Frankly a query without entity type wont be very useful, but lets do this for differentiation. Any issues in making a check for entityType not being supplied though(other than that it is a query param) ? Currently query without entity type is not supported. Some changes, although minor, will have to be made in storage layer for this. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4502) Sometimes Two AM containers get launched
Yesha Vora created YARN-4502: Summary: Sometimes Two AM containers get launched Key: YARN-4502 URL: https://issues.apache.org/jira/browse/YARN-4502 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Priority: Critical Scenario : * set yarn.resourcemanager.am.max-attempts = 2 * start dshell application {code} yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar hadoop-yarn-applications-distributedshell-*.jar -attempt_failures_validity_interval 6 -shell_command "sleep 150" -num_containers 16 {code} * Kill AM pid * Print container list for 2nd attempt {code} yarn container -list appattempt_1450825622869_0001_02 INFO impl.TimelineClientImpl: Timeline service address: http://xxx:port/ws/v1/timeline/ INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10: Total number of containers :2 Container-Id Start Time Finish Time StateHost Node Http Address LOG-URL container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 N/A RUNNINGxxx:25454 http://xxx:8042 http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 N/A RUNNINGxxx:25454 http://xxx:8042 http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa {code} * look for new AM pid Here, 2nd AM container was suppose to be started on container_e12_1450825622869_0001_02_01. But AM was not launched on container_e12_1450825622869_0001_02_01. It was in AQUIRED state. On other hand, container_e12_1450825622869_0001_02_02 got the AM running. Expected behavior: RM should not start 2 containers for starting AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070266#comment-15070266 ] Varun Saxena commented on YARN-4224: An important point. For entity table, the row keys are not sorted by created time. So when we fetch records from HBase, a limit of 100 for instance does not mean that we stop after fetching first 100 records. We will continue fetching records till row prefix matches and keep on removing the last entity based on created time to limit entities to 100. So, quite a few rows are scanned. If we do not make entity type mandatory, this would mean scan of even more rows, especially when for generic entity table, entity type can be anything. So I would prefer to have a check for entity type being required mandatorily. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4502) Sometimes Two AM containers get launched
[ https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070289#comment-15070289 ] Wangda Tan commented on YARN-4502: -- Thanks for [~yeshavora] reported this issue. Looked at this issue with [~jianhe]/[~vinodkv], root cause of this problem is: - After YARN-3535, all containers transition from ALLOCATED to KILLED state will be re-added to scheduler. And such resource request will be added to *current* scheduler application attempt. - If some containers are in ALLOCATED state and AM crashes, resource requests of these containers could be added to *new* scheduler application attempt. - When the new application attempt request AM container, it calls {code} // AM resource has been checked when submission Allocation amContainerAllocation = appAttempt.scheduler.allocate(appAttempt.applicationAttemptId, Collections.singletonList(appAttempt.amReq), EMPTY_CONTAINER_RELEASE_LIST, null, null); if (amContainerAllocation != null && amContainerAllocation.getContainers() != null) { assert (amContainerAllocation.getContainers().size() == 0); } {code} Some containers could be allocated of this scheduler.allocate call, these container will be ignored because the following *assert* is not enabled in production environment. - So this results to some container could be possibly leaked when we allocating retried AM containers. *Possible fixes*: 1) Release all allocated container of {{amContainerAllocation.getContainers()}} OR 2) Instead of using {{getCurrentAttemptForContainer}} in {{AbstractYarnScheduler#recoverResourceRequestForContainer}}, we should only recover ResourceRequest to the attempt which includes the container. > Sometimes Two AM containers get launched > > > Key: YARN-4502 > URL: https://issues.apache.org/jira/browse/YARN-4502 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Wangda Tan >Priority: Critical > > Scenario : > * set yarn.resourcemanager.am.max-attempts = 2 > * start dshell application > {code} > yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar > hadoop-yarn-applications-distributedshell-*.jar > -attempt_failures_validity_interval 6 -shell_command "sleep 150" > -num_containers 16 > {code} > * Kill AM pid > * Print container list for 2nd attempt > {code} > yarn container -list appattempt_1450825622869_0001_02 > INFO impl.TimelineClientImpl: Timeline service address: > http://xxx:port/ws/v1/timeline/ > INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10: > Total number of containers :2 > Container-Id Start Time Finish Time > StateHost Node Http Address >LOG-URL > container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa > container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa > {code} > * look for new AM pid > Here, 2nd AM container was suppose to be started on > container_e12_1450825622869_0001_02_01. But AM was not launched on > container_e12_1450825622869_0001_02_01. It was in AQUIRED state. > On other hand, container_e12_1450825622869_0001_02_02 got the AM running. > Expected behavior: RM should not start 2 containers for starting AM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070295#comment-15070295 ] Li Lu commented on YARN-4224: - Well, return first 100 entities is just one example (we can even say return one random entity within the given application, for example). For API design, we don't want implementations to affect our interfaces too much. Entity type is not a mandatory part of an entity query, so we can keep it as optional for entity queries. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070632#comment-15070632 ] Varun Saxena commented on YARN-4224: bq. At any rate, I agree that due to the possibility of omission ambiguities are perhaps possible. In that case, I suspect using different query nouns might be the ultimate solution (e.g. "apps" for the hierachical and "apps-uid" for UIDs). Although, it sounds awkward, I am leaning towards it as well > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070658#comment-15070658 ] Varun Saxena commented on YARN-4224: BTW, even in ATSv1 REST endpoint for fetching multiple entities looks like {{/ws/v1/timeline/\{entitytype\}}} which means multiple entities are returned within the scope of entity type. So there might not be a use case for this. Anyways in v2 we can change that with the knowledge that queries without entity type maybe slow with HBase implementation. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM
[ https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070635#comment-15070635 ] Naganarasimha G R commented on YARN-3995: - Thanks for the comments [~sjlee0], IIUC 2nd point is continuation of the first idea right ? bq. I am not too knowledgeable about the NM and so not sure if this is complicated/infeasible. {{PerNodeTimelineCollectorsAuxService}} can take this responsibility so i don't see any problem to it with NM, right ? I can think of little modification on top of your idea, * Once NM notifies the Auxillary service that the app is finished (by container finished call in the existing way), {{PerNodeTimelineCollectorsAuxService}} can add move this collector to a zombie collector Map. * This map stores the last event published time for the zombie collector. * We can have one thread running to check which zombie collector is inactive for configurable time period and then remove it Thus none of the events are lost till the end. like we can keep this period as 2 mins and if the collector in the zombie list not active for 2 mins then remove it and close it ? > Some of the NM events are not getting published due race condition when AM > container finishes in NM > > > Key: YARN-3995 > URL: https://issues.apache.org/jira/browse/YARN-3995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > As discussed in YARN-3045: While testing in TestDistributedShell found out > that few of the container metrics events were failing as there will be race > condition. When the AM container finishes and removes the collector for the > app, still there is possibility that all the events published for the app by > the current NM and other NM are still in pipeline, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070634#comment-15070634 ] Varun Saxena commented on YARN-4224: In short, limit to number of entities to return wont have any impact on number of rows to scan. We will have to scan all possible rows for that row prefix. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070649#comment-15070649 ] Varun Saxena commented on YARN-4224: bq. For API design, we don't want implementations to affect our interfaces too much That is a fair point. But then our main implementation of HBase may not be able to support it with good performance. And frankly if we keep entity type as optional query param, shouldn't we keep it optional even for hierarchical endpoint ? Why only for UID endpoint. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.wip.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070652#comment-15070652 ] Naganarasimha G R commented on YARN-3367: - Thanks for the comments [~sjlee0], bq. I would very much advocate using the JDK's ExecutorService (single-threaded executor in this case) over using a raw thread and its own blocking queue management. May be i dint get your thoughts completely but let me explain the reason i have taken this approach, Some points what i have considered : * We require all the events to be pushed in the order with which its submitted. Not sure whether we require order to be maintained across sync and async events but definitely within sycn/ async its required based on the last [~djp]'s comment. (for ex: metric events which are sent as async requires to be order to ensure aggregation logic works properly and sync events like container started / stopped so that state of it can be determined if there is any intermittent daemon failures) * As its single threaded better to merge the related events and push it once (like all the waiting async events can be clubbed and pushed at once ) * for the Sync events we need to throw an exception on failure. so that caller is informed that it failed. Considering this i thought of maintaining a blocking queue and thread. so that whenever the data is available then code in the thread can take some action, (and by the time thread finishes publishing and comes back to read the queue if multiple async entities are there it can merge and publish in next round.). May be the complexity will get reduced if we *need not maintain the order* across sync and async events. Or please inform if i increased the scope of the jira than what is required. bq. On TimelineEntities.java, Yep can incorporate those changes, i just relied on eclipse auto code generation for hash and equals for a given class. > Replace starting a separate thread for post entity with event loop in > TimelineClient > > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-3367-feature-YARN-2928.003.patch, > YARN-3367-feature-YARN-2928.v1.002.patch, > YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch > > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM
[ https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070709#comment-15070709 ] Naganarasimha G R commented on YARN-3995: - bq. If I recall, this window of opportunity is going to be quite small because any non-AM container will be completed before the app can be finished (and the AM container is completed). This is true in most of the cases, unless and untill AM doesn't wait for the containers launched/requested by it to go down before it goes down. I ran TestDistributedShell and cross verified the logs for any errors due to collector being not there and din't find any for the containers launched by it. But TestDistributedShell launches only 2 containers if we run with more container then can find the impact. bq. I suspect a simple linger might be sufficient, but do we see a case where we might miss writes otherwise? Yes simple linger should be sufficient, shall i make this configurable period ? so that there is backup option in case of any issues and if required in future we can handle it in a better way ? Also is launching one thread per collector for closing it is fine ? IMO configurable linger period is sufficient > Some of the NM events are not getting published due race condition when AM > container finishes in NM > > > Key: YARN-3995 > URL: https://issues.apache.org/jira/browse/YARN-3995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > As discussed in YARN-3045: While testing in TestDistributedShell found out > that few of the container metrics events were failing as there will be race > condition. When the AM container finishes and removes the collector for the > app, still there is possibility that all the events published for the app by > the current NM and other NM are still in pipeline, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM
[ https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070690#comment-15070690 ] Sangjin Lee commented on YARN-3995: --- If I recall, this window of opportunity is going to be quite small because any non-AM container will be completed before the app can be finished (and the AM container is completed). For this inversion to occur, there would have to be writes that originate from a remote NM that had a container (which had already been completed) but get delayed in reaching the timeline collector for some reason. I suspect a simple linger might be sufficient, but do we see a case where we might miss writes otherwise? > Some of the NM events are not getting published due race condition when AM > container finishes in NM > > > Key: YARN-3995 > URL: https://issues.apache.org/jira/browse/YARN-3995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > As discussed in YARN-3045: While testing in TestDistributedShell found out > that few of the container metrics events were failing as there will be race > condition. When the AM container finishes and removes the collector for the > app, still there is possibility that all the events published for the app by > the current NM and other NM are still in pipeline, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4462) FairScheduler: Disallow preemption from a queue
[ https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070707#comment-15070707 ] Hadoop QA commented on YARN-4462: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s {color} | {color:red} Patch generated 4 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 76, now 77). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 7s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 31s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 138m 46s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA
[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM
[ https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070716#comment-15070716 ] Varun Saxena commented on YARN-3995: bq. what i am trying to suggest is close/remove the collector only after a period of inactivity in the collector That would be better. I guess what you mean is that instead of hard timeout, we will have rolling timeout i.e. timeout will keep on being pushed as entities are written. It will only timeout once no entities are being written for the specified period. > Some of the NM events are not getting published due race condition when AM > container finishes in NM > > > Key: YARN-3995 > URL: https://issues.apache.org/jira/browse/YARN-3995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > As discussed in YARN-3045: While testing in TestDistributedShell found out > that few of the container metrics events were failing as there will be race > condition. When the AM container finishes and removes the collector for the > app, still there is possibility that all the events published for the app by > the current NM and other NM are still in pipeline, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070715#comment-15070715 ] Sangjin Lee commented on YARN-3367: --- I haven't fully digested the code so I might be off-base. But when I see variables such as stopped, waitForDrained, and drained along with the thread and the queue, it feels rather like reinventing the wheel. Also, I see two sets of wait-notify pairs. Using the executor service should take care of the need for using those, and hopefully we can wrap the code between taking the item off of the queue and looping back into a callable/runnable. > Replace starting a separate thread for post entity with event loop in > TimelineClient > > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-3367-feature-YARN-2928.003.patch, > YARN-3367-feature-YARN-2928.v1.002.patch, > YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch > > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070594#comment-15070594 ] Jian He commented on YARN-4138: --- sorry, my bad. I don't why the AllocationExpirationInfo.java previously ended up in hadoop-yarn/ directory. Seems the patch not applying on trunk any more.. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4343) Need to support Application History Server on ATSV2
[ https://issues.apache.org/jira/browse/YARN-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070600#comment-15070600 ] Naganarasimha G R commented on YARN-4343: - Hi [~vrushalic], thanks for getting back on this, may be i confused you guys a bit, what i mentioned was me and varun discussed about this issue and i will try to come up with rough approach or WIP to discuss further on this ! > Need to support Application History Server on ATSV2 > --- > > Key: YARN-4343 > URL: https://issues.apache.org/jira/browse/YARN-4343 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > > AHS is used by the CLI and Webproxy(REST), if the application related > information is not found in RM then it tries to fetch from AHS and show -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070611#comment-15070611 ] Sangjin Lee commented on YARN-3367: --- Thanks for the patch [~Naganarasimha]! I took a fairly quick look at the latest patch. One high level comment: as for the async dispatcher, I would very much advocate using the JDK's ExecutorService (single-threaded executor in this case) over using a raw thread and its own blocking queue management. It will definitely reduce the amount of code (and room for errors), and we can focus on the actual unit of work that needs to be done. Can you please consider using an ExecutorService over the thread + queue? If there is a compelling reason that an ExecutorService cannot work, I'd be curious to learn. On TimelineEntities.java, - hashCode(): why not simply return entities.hashCode()? entities is never null - equals(): again, note that entities is never null; that will simplify the implementation here > Replace starting a separate thread for post entity with event loop in > TimelineClient > > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-3367-feature-YARN-2928.003.patch, > YARN-3367-feature-YARN-2928.v1.002.patch, > YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch > > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM
[ https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070660#comment-15070660 ] Naganarasimha G R commented on YARN-3995: - Oops, Sorry my mistake , Thanks [~sjlee0] for correcting me. [~sjlee0] current code is already waiting for a second in a separate thread after AM container is closed (in PerNodeTimelineCollectorsAuxService.stopContainer), but the issue with that approach is: it just closes after 1 second though the events are still coming, but what i am trying to suggest is close/remove the collector only after a period of inactivity in the collector. Will that be good considering it will be usually getting delayed for metrics ? if above approach is not required then already existing approach waits for a second in separate thread, does it req any change ? (least i can think is few threads will be there if more AM's are run from a single NM ) > Some of the NM events are not getting published due race condition when AM > container finishes in NM > > > Key: YARN-3995 > URL: https://issues.apache.org/jira/browse/YARN-3995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > As discussed in YARN-3045: While testing in TestDistributedShell found out > that few of the container metrics events were failing as there will be race > condition. When the AM container finishes and removes the collector for the > app, still there is possibility that all the events published for the app by > the current NM and other NM are still in pipeline, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2599) Standby RM should also expose some jmx and metrics
[ https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070678#comment-15070678 ] Rohith Sharma K S commented on YARN-2599: - Gearing up on this issue, I assume that just avoiding cluster metrics and jmx from redirecting to active would be sufficient. But one thing I noticed that in YARN-1898, JMX and metrics were removed from NON_REDIRECTED_URIS in addendum patch. Is this JIRA intention is to revert that or need to add more JMX metrics like for supporting HA like YARN-2442? [~kasha] would you provide your thoughts please? > Standby RM should also expose some jmx and metrics > -- > > Key: YARN-2599 > URL: https://issues.apache.org/jira/browse/YARN-2599 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Rohith Sharma K S > > YARN-1898 redirects jmx and metrics to the Active. As discussed there, we > need to separate out metrics displayed so the Standby RM can also be > monitored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM
[ https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070723#comment-15070723 ] Sangjin Lee commented on YARN-3995: --- bq. This is true in most of the cases, unless and untill AM doesn't wait for the containers launched/requested by it to go down before it goes down. Are you thinking of cases where the AM crashes? If the app finishes normally, this sequence does not happen, right? bq. Yes simple linger should be sufficient, shall i make this configurable period ? so that there is backup option in case of any issues and if required in future we can handle it in a better way ? Making it configurable sounds fine to me. bq. Also is launching one thread per collector for closing it is fine ? I suspect it would be fine. Note that there would be a few collectors per NM at most. > Some of the NM events are not getting published due race condition when AM > container finishes in NM > > > Key: YARN-3995 > URL: https://issues.apache.org/jira/browse/YARN-3995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Affects Versions: YARN-2928 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > > As discussed in YARN-3045: While testing in TestDistributedShell found out > that few of the container metrics events were failing as there will be race > condition. When the AM container finishes and removes the collector for the > app, still there is possibility that all the events published for the app by > the current NM and other NM are still in pipeline, -- This message was sent by Atlassian JIRA (v6.3.4#6332)