[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051236#comment-15051236 ] Junping Du commented on YARN-3623: -- bq. "it means the cluster will and should bring up the timeline service v.1.5 (and nothing else)." Think over it again. I am still uncomfortable for the description - "(and nothing else)." It could be against for our possible solutions for YARN-4368. I prefer to remove this in an addendum patch or it could provide unnecessary restrictions if it get released in 2.8.0. Thoughts? > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4434) NodeManager Disk Checker parameter documentation is not correct
[ https://issues.apache.org/jira/browse/YARN-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051363#comment-15051363 ] Junping Du commented on YARN-4434: -- Hi [~ajisakaa], we are moving on to 2.6.3-RC0. Would you hold on any commit bit (unless they are agreed blockers) to branch-2.6.3 in later commit effort? Thanks. > NodeManager Disk Checker parameter documentation is not correct > --- > > Key: YARN-4434 > URL: https://issues.apache.org/jira/browse/YARN-4434 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, nodemanager >Affects Versions: 2.6.0, 2.7.1 >Reporter: Takashi Ohnishi >Assignee: Weiwei Yang >Priority: Minor > Fix For: 2.8.0, 2.6.3, 2.7.3 > > Attachments: YARN-4434.001.patch, YARN-4434.branch-2.6.patch > > > In the description of > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage, > it says > {noformat} > The default value is 100 i.e. the entire disk can be used. > {noformat} > But, in yarn-default.xml and source code, the default value is 90. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4416: Description: While debugging in eclipse came across a scenario where in i had to get to know the name of the queue but every time i tried to see the queue it was getting hung. On seeing the stack realized there was a deadlock but on analysis found out that it was only due to *queue.toString()* during debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. Hence we need to ensure following : # queueCapacity, resource-usage has their own read/write lock hence synchronization is not req # numContainers is volatile hence synchronization is not req. # read/write lock could be added to Ordering Policy. Read operations don't need synchronized. So {{getNumApplications}} doesn't need synchronized. (First 2 will be handled in this jira and the third will be handled in YARN-4443) was: While debugging in eclipse came across a scenario where in i had to get to know the name of the queue but every time i tried to see the queue it was getting hung. On seeing the stack realized there was a deadlock but on analysis found out that it was only due to *queue.toString()* during debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. Hence we need to ensure following : # queueCapacity, resource-usage has their own read/write lock. # > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4443) Improve locks of OrderingPolicy
Naganarasimha G R created YARN-4443: --- Summary: Improve locks of OrderingPolicy Key: YARN-4443 URL: https://issues.apache.org/jira/browse/YARN-4443 Project: Hadoop YARN Issue Type: Sub-task Reporter: Naganarasimha G R Assignee: Naganarasimha G R Improve locks of OrderingPolicy as its tightly coupled with Leaf queue for its consistency. we should decouple it from LeafQueue to better API design. Potentially we need to rethink API of OrderingPolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
[ https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051470#comment-15051470 ] Hadoop QA commented on YARN-4399: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s {color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 11s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 8s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 153m 27s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12776828/YARN-4399.003.patch | | JIRA Issue | YARN-4399 | | Optional Tests | asflicense compile javac
[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3946: Attachment: YARN-3946.v1.008.patch Thanks for pointing it out [~wangda], have corrected the test case and the applicable and correct checkstyle issues > Allow fetching exact reason as to why a submitted app is in ACCEPTED state in > CS > > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, > YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, > YARN-3946.v1.007.patch, YARN-3946.v1.008.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4416: Attachment: YARN-4416.v2.001.patch Hi [~wangda], as per your earlier comment i have split the jira into 2 and have only focused on correcting by removing synchronization on the methods which are used in {{toString}} excluding the changes required for ordering policy. Further i have taken care of two things * absoluteCapacityResource was not used hence the variable and the method i have removed * getNumContainers was unnecessarily overridded in leaf queue hence removed it. Please provide your feed back > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051478#comment-15051478 ] Hudson commented on YARN-3623: -- FAILURE: Integrated in Hadoop-trunk-Commit #8951 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8951/]) YARN-3623-Addendum: Improve the description for Timeline Service Version (xgong: rev 21daa6c68a0bff51a14e748bf14d56b2f5a5580f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, > YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off
[ https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051498#comment-15051498 ] Sangjin Lee commented on YARN-4356: --- Thanks for the feedback. Now that YARN-3623 has been committed, I'll first cherry-pick that commit over to our branch. I'll also update the patch based on comments from [~djp] and [~Naganarasimha]. > ensure the timeline service v.2 is disabled cleanly and has no impact when > it's turned off > -- > > Key: YARN-4356 > URL: https://issues.apache.org/jira/browse/YARN-4356 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: YARN-4356-feature-YARN-2928.002.patch, > YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, > YARN-4356-feature-YARN-2928.005.patch, > YARN-4356-feature-YARN-2928.poc.001.patch > > > For us to be able to merge the first milestone drop to trunk, we want to > ensure that once disabled the timeline service v.2 has no impact from the > server side to the client side. If the timeline service is not enabled, no > action should be done. If v.1 is enabled but not v.2, v.1 should behave the > same as it does before the merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4422) Generic AHS sometimes doesn't show started, node, or logs on App page
[ https://issues.apache.org/jira/browse/YARN-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051548#comment-15051548 ] Ming Ma commented on YARN-4422: --- Thanks [~eepayne]. > Generic AHS sometimes doesn't show started, node, or logs on App page > - > > Key: YARN-4422 > URL: https://issues.apache.org/jira/browse/YARN-4422 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 3.0.0, 2.8.0, 2.7.3 > > Attachments: AppAttemptPage no container or node.jpg, AppPage no logs > or node.jpg, YARN-4422.001.patch > > > Sometimes the AM container for an app isn't able to start the JVM. This can > happen if bogus JVM options are given to the AM container ( > {{-Dyarn.app.mapreduce.am.command-opts=-InvalidJvmOption}}) or when > misconfiguring the AM container's environment variables > ({{-Dyarn.app.mapreduce.am.env="JAVA_HOME=/foo/bar/baz}}) > When the AM container for an app isn't able to start the JVM, the Application > page for that application shows {{N/A}} for the {{Started}}, {{Node}}, and > {{Logs}} columns. It _does_ have links for each app attempt, and if you click > on one of them, you go to the Application Attempt page, where you can see all > containers with links to their logs and nodes, including the AM container. > But none of that shows up for the app attempts on the Application page. > Also, on the Application Attempt page, in the {{Application Attempt > Overview}} section, the {{AM Container}} value is {{null}} and the {{Node}} > value is {{N/A}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051459#comment-15051459 ] Xuan Gong commented on YARN-3623: - +1 Checking this in > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, > YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051542#comment-15051542 ] Kuhu Shukla commented on YARN-4311: --- [~eepayne], request for comments and review. Thanks a lot. > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, > YARN-4311-v3.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051463#comment-15051463 ] Xuan Gong commented on YARN-3623: - Committed the addendum patch in trunk/branch-2. Thanks, junping. > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, > YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3623: - Attachment: YARN-3623-addendum.patch Sounds good. I put an addendum patch here. Please check if it make sense. > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, > YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051335#comment-15051335 ] Naganarasimha G R commented on YARN-3623: - +1 lgtm > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, > YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051332#comment-15051332 ] Sunil G commented on YARN-4108: --- Hi [~leftnoteasy] Thank you for sharing the detailed doc and patch. Its a wonderful effort and came out nicely. I went through the doc mainly, and checked few part of the patch. I will got through patch in detail soon and share more comments if I have. Few major doubts: 1. With different {{PreemptionType}}, are we planning to handle preemption across queue, within queue (fifo/priority), within user etc? YARN-2009 was trying to handling the preemption within a queue adhering to priority. 2. Currently all containers from a node is selected and tried to find which all are matching the preemption type. Later {{selectContainersToPreempt}} helps to clear out the non-valid containers. >> I would see it will be a great help if flexibility is provided with some >> interface to sort containers eventhough a great deal of validation is done. Sorting parameter can be - submitted time - priority of app (since we take bunch of containers from a node first, only few apps in cluster will come in one shot) - priority of containers - time remaining for the container to finish (% of completion) With these flexibility, user can tune which containers will be his first choice for preemption provided all the size/user limit/locality are matched. 3. Could we get a choice to kill container based on data locality, I could see the changes but cudnt see how its achieved in preemption manager end. > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-v1.pdf, YARN-4108.poc.1.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051419#comment-15051419 ] Junping Du commented on YARN-3623: -- Yes. That's the flexibility we want here - we probably bring up old version ATS service for legacy running apps during upgrade. So we should pay more attentions here given configurations (especially in yarn-default.xml) serve as a public protocol to hadoop users. > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, > YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1856) cgroups based memory monitoring for containers
[ https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1856: Attachment: YARN-1856.004.patch {quote} Should add all the configs to yarn-default.xml, saying they are still early configs? {quote} I don't think we've figured out how to specify the various resource isolation pieces from a config perspective. I'd like to keep to private for now and I'll file a follow up JIRA to document the configs once we've figured it out. The remaining points all relate to this so I'll address them when as part of that JIRA. {quote} ResourceHandlerModule - Formatting of new code is a little off: the declaration of getCgroupsMemoryResourceHandler(). There are other occurrences like this in that class before in this patch, you may want to fix those. {quote} Fixed. {quote} BUG! getCgroupsMemoryResourceHandler() incorrectly locks DiskResourceHandler instead of MemoryResourceHandler. CGroupsMemoryResourceHandlerImpl {quote} Not a bug! But a *bad* typo nonetheless. Fixed {quote} What is this doing? {{ CGroupsHandler.CGroupController MEMORY = CGroupsHandler.CGroupController.MEMORY; }} Is it forcing a class-load or something? Not sure if this is needed. If this is needed, you may want to add a comment here. {quote} No just a shorthand instead of specifying the entire qualified variable every time. {quote} NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERC -> NM_MEMORY_RESOURCE_CGROUPS_SOFT_LIMIT_PERCENTAGE. Similarly the default constant. {quote} Fixed. {quote} CGROUP_PARAM_MEMORY_HARD_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES / CGROUP_PARAM_MEMORY_SWAPPINESS can all be static and final. {quote} Interface variables are public static final by default. Any reason you want to add static final? > cgroups based memory monitoring for containers > -- > > Key: YARN-1856 > URL: https://issues.apache.org/jira/browse/YARN-1856 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla >Assignee: Varun Vasudev > Attachments: YARN-1856.001.patch, YARN-1856.002.patch, > YARN-1856.003.patch, YARN-1856.004.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent
[ https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-4358: -- Attachment: YARN-4358.addendum-2.patch Updated based on [~subru]'s suggestion > Improve relationship between SharingPolicy and ReservationAgent > --- > > Key: YARN-4358 > URL: https://issues.apache.org/jira/browse/YARN-4358 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.4.patch, > YARN-4358.addendum-2.patch, YARN-4358.addendum.patch, YARN-4358.patch > > > At the moment an agent places based on available resources, but has no > visibility to extra constraints imposed by the SharingPolicy. While not all > constraints are easily represented some (e.g., max-instantaneous resources) > are easily represented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050449#comment-15050449 ] Rohith Sharma K S commented on YARN-3226: - Sorry for coming late, looking into UI part of the code. UI looks good and testing in once node cluster too for CS. One nit, can name heading like {{ClusterNodesMetrics}} instead of {{ClusterMetricsOnMetrics}} > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4438) Implement RM leader election with curator
[ https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050391#comment-15050391 ] Karthik Kambatla commented on YARN-4438: Would very much like for us to use Curator for leader election. May be, HDFS could also do the same in the future. Quickly skimmed through the patch. High-level comments: # IIRR we use the same ZK-quorum for both leader election and the store. Can we re-use the CuratorFramework so leader-election and store-operations are fully consistent. Otherwise, the clients (and their individual timeouts etc.) could lead to inconsistencies? # Would it be possible to hide the implementation of the leader-election - ActiveStandbyElector vs CuratorElector - behind EmbeddedElector? AdminService or RM don't need to know the details? In any case, having written some Curator code in the past, I would like to review the code more closely. > Implement RM leader election with curator > - > > Key: YARN-4438 > URL: https://issues.apache.org/jira/browse/YARN-4438 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4438.1.patch > > > This is to implement the leader election with curator instead of the > ActiveStandbyElector from common package, this also avoids adding more > configs in common to suit RM's own needs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051679#comment-15051679 ] Hadoop QA commented on YARN-4416: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 20s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager introduced 1 new FindBugs issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 16s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 53s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 146m 48s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.getAbsoluteUsedCapacity() is unsynchronized,
[jira] [Updated] (YARN-4414) Nodemanager connection errors are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4414: --- Attachment: YARN-4414.1.patch > Nodemanager connection errors are retried at multiple levels > > > Key: YARN-4414 > URL: https://issues.apache.org/jira/browse/YARN-4414 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Jason Lowe >Assignee: Chang Li > Attachments: YARN-4414.1.patch > > > This is related to YARN-3238. Ran into more scenarios where connection > errors are being retried at multiple levels, like NoRouteToHostException. > The fix for YARN-3238 was too specific, and I think we need a more general > solution to catch a wider array of connection errors that can occur to avoid > retrying them both at the RPC layer and at the NM proxy layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off
[ https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051966#comment-15051966 ] Li Lu commented on YARN-4356: - I can see in the v.6 patch both concerns are addressed. Any other comments/suggestions/concerns? > ensure the timeline service v.2 is disabled cleanly and has no impact when > it's turned off > -- > > Key: YARN-4356 > URL: https://issues.apache.org/jira/browse/YARN-4356 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: YARN-4356-feature-YARN-2928.002.patch, > YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, > YARN-4356-feature-YARN-2928.005.patch, YARN-4356-feature-YARN-2928.006.patch, > YARN-4356-feature-YARN-2928.poc.001.patch > > > For us to be able to merge the first milestone drop to trunk, we want to > ensure that once disabled the timeline service v.2 has no impact from the > server side to the client side. If the timeline service is not enabled, no > action should be done. If v.1 is enabled but not v.2, v.1 should behave the > same as it does before the merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4414) Nodemanager connection errors are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4414: --- Attachment: YARN-4414.1.2.patch > Nodemanager connection errors are retried at multiple levels > > > Key: YARN-4414 > URL: https://issues.apache.org/jira/browse/YARN-4414 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Jason Lowe >Assignee: Chang Li > Attachments: YARN-4414.1.2.patch, YARN-4414.1.patch > > > This is related to YARN-3238. Ran into more scenarios where connection > errors are being retried at multiple levels, like NoRouteToHostException. > The fix for YARN-3238 was too specific, and I think we need a more general > solution to catch a wider array of connection errors that can occur to avoid > retrying them both at the RPC layer and at the NM proxy layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off
[ https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4356: -- Attachment: YARN-4356-feature-YARN-2928.006.patch Posted patch v.6. Addressed Junping's comment for moving the YarnConfiguration to the argument for PerNodeAuxService and TimelineReaderServer. Also, per Naga's comment I deprecated RM_SYSTEM_METRIC_PUBLISHER_ENABLED. > ensure the timeline service v.2 is disabled cleanly and has no impact when > it's turned off > -- > > Key: YARN-4356 > URL: https://issues.apache.org/jira/browse/YARN-4356 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: YARN-4356-feature-YARN-2928.002.patch, > YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, > YARN-4356-feature-YARN-2928.005.patch, YARN-4356-feature-YARN-2928.006.patch, > YARN-4356-feature-YARN-2928.poc.001.patch > > > For us to be able to merge the first milestone drop to trunk, we want to > ensure that once disabled the timeline service v.2 has no impact from the > server side to the client side. If the timeline service is not enabled, no > action should be done. If v.1 is enabled but not v.2, v.1 should behave the > same as it does before the merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051956#comment-15051956 ] Li Lu commented on YARN-4224: - I scanned through the patch. The patch itself looks fine, but I would like to check with the broader community about the patterns proposed in this JIRA. Since this is blocking the ongoing web UI work, we really want to have a relatively stable interface on this before we proceed changing the UI side. IIUC we're putting the mandatory parameters in a hierarchical order in the URL, and adding optional parameters as query parameters. This approach looks fine with me. For naming conventions, we really want to be consistent with the rest part of the codebase. [~wangda] any suggestions/comments here, given this is quite related with the next-gen UI? Thanks! > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4224-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3817) [Aggregation] Flow and User level aggregation on Application States table
[ https://issues.apache.org/jira/browse/YARN-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3817: Attachment: YARN-3817-poc-v1-rebase.patch Rebase the POC patch to the feature-YARN-2928 branch. The new patch is based on YARN-3816-feature-YARN-2928.v4.1.patch. > [Aggregation] Flow and User level aggregation on Application States table > - > > Key: YARN-3817 > URL: https://issues.apache.org/jira/browse/YARN-3817 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Attachments: Detail Design for Flow and User Level Aggregation.pdf, > YARN-3817-poc-v1-rebase.patch, YARN-3817-poc-v1.patch > > > We need time-based flow/user level aggregation to present flow/user related > states to end users. > Flow level represents summary info of a specific flow. User level aggregation > represents summary info of a specific user, it should include summary info of > accumulated and statistic means (by two levels: application and flow), like: > number of Flows, applications, resource consumption, resource means per app > or flow, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4219) New levelDB cache storage for timeline v1.5
[ https://issues.apache.org/jira/browse/YARN-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4219: Attachment: YARN-4219-YARN-4265.003.patch Fixed the synchronization issue on service stop. > New levelDB cache storage for timeline v1.5 > --- > > Key: YARN-4219 > URL: https://issues.apache.org/jira/browse/YARN-4219 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.8.0 >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4219-YARN-4265.001.patch, > YARN-4219-YARN-4265.002.patch, YARN-4219-YARN-4265.003.patch, > YARN-4219-trunk.001.patch, YARN-4219-trunk.002.patch, > YARN-4219-trunk.003.patch > > > We need to have an "offline" caching storage for timeline server v1.5 after > the changes in YARN-3942. The in memory timeline storage may run into OOM > issues when used as a cache storage for entity file timeline storage. We can > refactor the code and have a level db based caching storage for this use > case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050551#comment-15050551 ] Junping Du commented on YARN-3623: -- Ok. I think I already find the answer from YARN-4234 that ATS 1.5 depends on this patch. Will go ahead to commit it to 2.8. > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off
[ https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050645#comment-15050645 ] Junping Du commented on YARN-4356: -- Thanks [~sjlee0] for updating the patch! bq. This is the java 7 diamond operator (<>) which is a shorthand for inferring types. The type information is NOT removed. It's inferred by the compiler, and the compiler produces the same bytecode as specifying the types. Got it. Sounds like my coffee is stale and I need a new cup. :) bq. Got it. Can we proceed with the current patch and get that fix once YARN-3586 goes in? Sure. Just a reminder here as I also forget it myself. bq. This is addressing a javadoc error. The ampersand ("&") is a special character for javadoc, and it breaks javadoc. It needs to be entity-escaped. Oh. I see. Let's keep it here. bq. This is used for the test method launchServer(). This method is invoked directly by a unit test (thus the @VisibleForTesting annotation). The same for TimelineReaderServer. If so, I would prefer not to update configuration in launchServer(), but pass a updated configuration to the method instead, just as other daemons (RM, NM) doing. The reason is it could be hard to track the real configurations for daemons/services if we override them in internal logic. May be we should follow the same practices? bq. That's fine. I still put up the patch that includes a version of that because without it things won't even compile. I will wait until YARN-3623 goes in before I remove that piece from this patch, then this can get committed. Sounds good. I just commit YARN-3623 to trunk. Other looks good to me in 005 patch. > ensure the timeline service v.2 is disabled cleanly and has no impact when > it's turned off > -- > > Key: YARN-4356 > URL: https://issues.apache.org/jira/browse/YARN-4356 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: YARN-4356-feature-YARN-2928.002.patch, > YARN-4356-feature-YARN-2928.003.patch, YARN-4356-feature-YARN-2928.004.patch, > YARN-4356-feature-YARN-2928.005.patch, > YARN-4356-feature-YARN-2928.poc.001.patch > > > For us to be able to merge the first milestone drop to trunk, we want to > ensure that once disabled the timeline service v.2 has no impact from the > server side to the client side. If the timeline service is not enabled, no > action should be done. If v.1 is enabled but not v.2, v.1 should behave the > same as it does before the merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3623: - Target Version/s: 2.8.0 (was: YARN-2928) > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050641#comment-15050641 ] Hadoop QA commented on YARN-4309: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 46s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s {color} | {color:red} Patch generated 3 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 358, now 358). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 46s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 54s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 17s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 20s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. {color} | |
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050496#comment-15050496 ] Junping Du commented on YARN-3623: -- Oh. Just noticed target version is YARN-2928 but I think ATS 1.5 need this also. Isn't it? Shall we commit the patch to trunk/branch-2? > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050495#comment-15050495 ] Varun Vasudev commented on YARN-2934: - [~Naganarasimha] - one minor comment - can we make the tail size a config parameter? Currently it's hard-coded in the code. > Improve handling of container's stderr > --- > > Key: YARN-2934 > URL: https://issues.apache.org/jira/browse/YARN-2934 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gera Shegalov >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, > YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch > > > Most YARN applications redirect stderr to some file. That's why when > container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050592#comment-15050592 ] Hudson commented on YARN-3623: -- FAILURE: Integrated in Hadoop-trunk-Commit #8950 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8950/]) YARN-3623. Add a config to indicate the Timeline Service version. (junping_du: rev f910e4f639dc311fcb257bfcb869b1aa8b2c0643) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050887#comment-15050887 ] Naganarasimha G R commented on YARN-2934: - Hi [~vvasudev], Hi as per the [comment|https://issues.apache.org/jira/browse/YARN-2934?focusedCommentId=14983970=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14983970] from [~steve_l], he was of the opinion that it might be less chances that some one would configure it, but i am open to add additional configuration if not an issue ! > Improve handling of container's stderr > --- > > Key: YARN-2934 > URL: https://issues.apache.org/jira/browse/YARN-2934 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gera Shegalov >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, > YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch > > > Most YARN applications redirect stderr to some file. That's why when > container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers
[ https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050492#comment-15050492 ] Hadoop QA commented on YARN-1856: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 14s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 49s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 47s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 47s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 36s {color} | {color:red} Patch generated 11 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 240, now 248). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 34s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 26s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 33s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 7s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 0s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12776756/YARN-1856.004.patch | | JIRA Issue | YARN-1856 | | Optional Tests | asflicense
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050494#comment-15050494 ] Junping Du commented on YARN-3623: -- +1. Committing. > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4416: Description: While debugging in eclipse came across a scenario where in i had to get to know the name of the queue but every time i tried to see the queue it was getting hung. On seeing the stack realized there was a deadlock but on analysis found out that it was only due to *queue.toString()* during debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. Hence we need to ensure following : was: While debugging in eclipse came across a scenario where in i had to get to know the name of the queue but every time i tried to see the queue it was getting hung. On seeing the stack realized there was a deadlock but on analysis found out that it was only due to *queue.toString()* during debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized and better be handled through read and write locks. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4416: Issue Type: Sub-task (was: Bug) Parent: YARN-3091 > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off
[ https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051306#comment-15051306 ] Hadoop QA commented on YARN-4356: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 12 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 28s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 7s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 55s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 27s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 7m 3s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 3m 18s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 54s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in feature-YARN-2928 has 3 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 50s {color} | {color:red} hadoop-yarn-common in feature-YARN-2928 failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 42s {color} | {color:red} hadoop-yarn-server-resourcemanager in feature-YARN-2928 failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 59s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 13s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 36m 15s {color} | {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 5 new issues (was 779, now 779). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 55s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 49m 10s {color} | {color:red} root-jdk1.7.0_91 with JDK v1.7.0_91 generated 5 new issues (was 772, now 772). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 55s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 29s {color} | {color:red} Patch generated 7 new checkstyle issues in root (total was 1970, now 1942). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 6m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 3m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 14m 9s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 7m 8s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 100, now 100). {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 7m 8s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 100, now 100). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 49s {color} | {color:green} the patch passed with
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051264#comment-15051264 ] Sangjin Lee commented on YARN-3623: --- That's why I asked whether you were OK with the wording. How about "it means the cluster should bring up the timeline service specified by the version" (dropping the exclusive phrase)? > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4424) Fix deadlock in RMAppImpl
[ https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051309#comment-15051309 ] Junping Du commented on YARN-4424: -- Hi [~leftnoteasy] and [~jianhe], Is this blocker for 2.6.3? If so, we should commit it to 2.6.3. > Fix deadlock in RMAppImpl > - > > Key: YARN-4424 > URL: https://issues.apache.org/jira/browse/YARN-4424 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Jian He >Priority: Blocker > Fix For: 2.7.2, 2.6.3 > > Attachments: YARN-4424.1.patch > > > {code} > yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn > application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING > 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: > http://XXX:8188/ws/v1/timeline/ > 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at > XXX/0.0.0.0:8050 > 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History > server at XXX/0.0.0.0:10200 > {code} > {code:title=RM log} > 2015-12-04 21:59:19,744 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000 > 2015-12-04 22:00:50,945 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000 > 2015-12-04 22:02:22,416 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000 > 2015-12-04 22:03:53,593 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 24 > 2015-12-04 22:05:24,856 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000 > 2015-12-04 22:06:56,235 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000 > 2015-12-04 22:08:27,510 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000 > 2015-12-04 22:09:58,786 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051377#comment-15051377 ] Li Lu commented on YARN-3623: - I'd like to make sure I understand this change correctly. Specifically, what kind of new cases do we allow by removing the "nothing else" part. IIUC, this allows us to bring up an old ATS server after the system has been upgraded. This looks fine with me. Am I missing some other cases introduced by this change? > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, > YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4438) Implement RM leader election with curator
[ https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052001#comment-15052001 ] Jian He commented on YARN-4438: --- bq. Otherwise, the clients (and their individual timeouts etc.) could lead to inconsistencies? agree, I was actually talking about the same with [~xgong] offline, didn't do that just because wanted to keep the change minimal. I'll make the change accordingly. bq. Would it be possible to hide the implementation of the leader-election - ActiveStandbyElector vs CuratorElector - behind EmbeddedElector? I'm thinking to remove EmbeddedElectorService later on and separate the LeaderElectorService out from the AdminService. does that make sense ? > Implement RM leader election with curator > - > > Key: YARN-4438 > URL: https://issues.apache.org/jira/browse/YARN-4438 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4438.1.patch > > > This is to implement the leader election with curator instead of the > ActiveStandbyElector from common package, this also avoids adding more > configs in common to suit RM's own needs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052022#comment-15052022 ] Vinod Kumar Vavilapalli commented on YARN-4224: --- We should model this around resources (as REST specifies) instead of around queries. Special purpose queries can be treated as shortcuts to existing resource hierarchy. Also, user is missing from the above set of examples. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4224-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
[ https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052026#comment-15052026 ] Lin Yiqun commented on YARN-4399: - I test the failed unit tests and these error is not related. > FairScheduler allocated container should resetSchedulingOpportunities count > of its priority > --- > > Key: YARN-4399 > URL: https://issues.apache.org/jira/browse/YARN-4399 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4399.001.patch, YARN-4399.002.patch, > YARN-4399.003.patch > > > There is a bug on fairScheduler allocating containers when you configurate > the locality configs.When you attempt to assigned a container,it will invoke > {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned > successfully or not. And if you configurate the > yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value > will influence the locality of containers.Because if one container is > assigned successfully and its priority schedulingOpportunity count will be > increased, and second container will be increased again.This will may be let > their priority of allowedLocality degrade. And this will let this container > dealt by rackRequest. So I think in fairScheduler allocating container, if > the previous container was dealt, its priority of schedulerCount should be > reset to 0, and don't let its value influence container's allocating in next > iteration and this will increased the locality of containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052038#comment-15052038 ] Li Lu commented on YARN-4224: - And, BTW, under the current resource model of ATS v2, maybe a full path to locate an entity can be like: {code} /clusters/{clusterid}/users/{userid}/flows/{flowid}/flowruns/{flowrunid}/apps/{appid}/entities/{entityid} {code} Any stages in between will return the info of a specific entity (cluster, user, flow, flowrun, app), or list the next level of resources of it. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4224-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4350) TestDistributedShell fails
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051998#comment-15051998 ] Naganarasimha G R commented on YARN-4350: - [~sjlee0] can you cherry pick YARN-2859 into our branch, so that the code that it will be simpler while merging the branch ? > TestDistributedShell fails > -- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-4350-feature-YARN-2928.001.patch > > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052030#comment-15052030 ] Li Lu commented on YARN-4224: - OK I've got a few references for the discussion. I looked at the WebHDFS REST APIs but the use case there is not quite similar to our use case here. The RM REST APIs mostly only have one mandatory parameter, such as "/apps/{appid}/appattempt". AHS web services is probably the most similar use case here, so we can borrow much of its resource model. For multiple parameters we organize them as an ordered sequence, each one following their parameter names, such as "/apps/{appid}/appattempts/{appattemptid}/containers/{containerid}". Any APIs that do not end on a parameter (such as "/apps/{appid}/appattempts") is treated as a list. This appears to be the typical resource model in YARN. The MapReduce AMWebService is another example for this. Another thing is, for special queries like flowapps, we can add them as short cuts on the flow level, such as "/cluster/{clusterid}/user/{userid}/flow/{flowid}/apps". Could somebody please remind me why we decide to remove user from the path? Thanks! > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4224-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run
[ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052043#comment-15052043 ] Jun Gong commented on YARN-3998: [~vvasudev] Thanks for the attention and suggestion. We have a patch for it, it implemented the policy "restart on all errors". Will attach a patch if it is OK. > Add retry-times to let NM re-launch container when it fails to run > -- > > Key: YARN-3998 > URL: https://issues.apache.org/jira/browse/YARN-3998 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jun Gong >Assignee: Jun Gong > > I'd like to add a field(retry-times) in ContainerLaunchContext. When AM > launches containers, it could specify the value. Then NM will re-launch the > container 'retry-times' times when it fails to run(e.g.exit code is not 0). > It will save a lot of time. It avoids container localization. RM does not > need to re-schedule the container. And local files in container's working > directory will be left for re-use.(If container have downloaded some big > files, it does not need to re-download them when running again.) > We find it is useful in systems like Storm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052096#comment-15052096 ] Subru Krishnan commented on YARN-4340: -- [~seanpo03], I noticed there are couple of minor checkstyle & Javadoc issues. Can you kindly fix those. Also the *TestReservationInputValidator::testSubmitReservationDoesnotExist* is failing, seems like a minor fix. It will be great if you can try running it e2e on at least a single node setup. Thanks for working on this. > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch, > YARN-4340.v6.patch, YARN-4340.v7.patch, YARN-4340.v8.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id". > YARN-4420 has a dependency on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051985#comment-15051985 ] Naganarasimha G R commented on YARN-3946: - Hi [~wangda], Java doc, checkstyle and unit test case issues are either not related to the patch or not valid to be taken care > Allow fetching exact reason as to why a submitted app is in ACCEPTED state in > CS > > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, > YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, > YARN-3946.v1.007.patch, YARN-3946.v1.008.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
[ https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4440: Attachment: YARN-4440.002.patch Resolve the remained error. > FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time > - > > Key: YARN-4440 > URL: https://issues.apache.org/jira/browse/YARN-4440 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4440.001.patch, YARN-4440.002.patch > > > It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} > method > {code} > // default level is NODE_LOCAL > if (! allowedLocalityLevel.containsKey(priority)) { > allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL); > return NodeType.NODE_LOCAL; > } > {code} > If you first invoke this method, it doesn't init time in > lastScheduledContainer and this will lead to execute these code for next > invokation: > {code} > // check waiting time > long waitTime = currentTimeMs; > if (lastScheduledContainer.containsKey(priority)) { > waitTime -= lastScheduledContainer.get(priority); > } else { > waitTime -= getStartTime(); > } > {code} > the waitTime will subtract to FsApp startTime, and this will be easily more > than the delay time and allowedLocality degrade. Because FsApp startTime will > be start earlier than currentTimeMs. So we should add the initial time of > priority to prevent comparing with FsApp startTime and allowedLocalityLevel > degrade. And this problem will have more negative influence for small-jobs. > The YARN-4399 also discuss some problem in aspect of locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism
[ https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052108#comment-15052108 ] Sidharta Seethana commented on YARN-3542: - hi [~vvasudev], thanks for the updated patch. Please see comments below. h4. CGroupsCpuResourceHandlerImpl.java {code} import org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler; import org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler; import org.apache.hadoop.yarn.server.nodemanager.util.LCEResourcesHandler; {code} These are unused imports in CGroupsCpuResourceHandlerImpl {code} @VisibleForTesting static final String CPU_PERIOD_US = "cfs_period_us"; @VisibleForTesting static final String CPU_QUOTA_US = "cfs_quota_us"; @VisibleForTesting static final String CPU_SHARES = "shares”; {code} Move these to CGroupsHandler ? {code} int quotaUS = MAX_QUOTA_US; int periodUS = (int) (MAX_QUOTA_US / yarnProcessors); {code} About how shares/cfs_period_us/cfs_quota_us are used : additional comments/documentation and examples (as unit tests?) would be useful. It took me a while to trace through the code using some examples. h4. CGroupsLCEResourcesHandler.java Since the class has been marked deprecated, is it necessary to make the rest of the changes that are included ? h4. LinuxContainerExecutor.java {code} private LCEResourcesHandler getResourcesHandler(Configuration conf) { LCEResourcesHandler handler = ReflectionUtils.newInstance( conf.getClass(YarnConfiguration.NM_LINUX_CONTAINER_RESOURCES_HANDLER, DefaultLCEResourcesHandler.class, LCEResourcesHandler.class), conf); // Stop using CgroupsLCEResourcesHandler // use the resource handler chain instead // ResourceHandlerModule will create the cgroup cpu module if // CgroupsLCEResourcesHandler is set if (handler instanceof CgroupsLCEResourcesHandler) { handler = ReflectionUtils.newInstance(DefaultLCEResourcesHandler.class, conf); } handler.setConf(conf); return handler; } {code} Since all resource handling is now in the resource handler chain - there is no longer a need to have references to LCEResourcesHandler in LinuxContainerExeuctor - all related config handling etc should be in ResourceHandlerModule.java (which already seems to the case). IMO, all references to LCEResourcesHandler (and sub-classes) should be removed from LinuxContainerExecutor. > Re-factor support for CPU as a resource using the new ResourceHandler > mechanism > --- > > Key: YARN-3542 > URL: https://issues.apache.org/jira/browse/YARN-3542 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana >Priority: Critical > Attachments: YARN-3542.001.patch, YARN-3542.002.patch, > YARN-3542.003.patch > > > In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier > addition of new resource types in the nodemanager (this was used for network > as a resource - See YARN-2140 ). We should refactor the existing CPU > implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using > the new ResourceHandler mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052035#comment-15052035 ] Li Lu commented on YARN-4224: - Reformat: OK I've got a few references for the discussion. I looked at the WebHDFS REST APIs but the use case there is not quite similar to our use case here. The RM REST APIs mostly only have one mandatory parameter, such as {{/apps/\{appid\}/appattempt}}. AHS web services is probably the most similar use case here, so we can borrow much of its resource model. For multiple parameters we organize them as an ordered sequence, each one following their parameter names, such as {{/apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}}}. Any APIs that do not end on a parameter (such as {{/apps/\{appid\}/appattempts}}) is treated as a list. This appears to be the typical resource model in YARN. The MapReduce AMWebService is another example for this. Another thing is, for special queries like flowapps, we can add them as short cuts on the flow level, such as {{/cluster/\{clusterid\}/user/\{userid\}/flow/\{flowid\}/apps}}. Could somebody please remind me why we decide to remove user from the path? Thanks! > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4224-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052067#comment-15052067 ] Subru Krishnan commented on YARN-4340: -- Thanks for addressing my feedback [~seanpo03]. The latest patch (v8) LGTM. [~curino], can you take a look and commit. > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch, > YARN-4340.v6.patch, YARN-4340.v7.patch, YARN-4340.v8.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id". > YARN-4420 has a dependency on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3226: -- Attachment: 0004-YARN-3226.patch Thank You [~rohithsharma] for the comments. Addressing the same in the new patch. Also addressing the comments given by [~djp] earlier. Kindly help to review. > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned
[ https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051245#comment-15051245 ] Naganarasimha G R commented on YARN-4415: - Hi [~wangda], bq. User doesn't need to update configurations a lot if new labels added (Assume partition will be shared to all queues) User has to change configurations a lot if new labels added (Assume partition will be shared to few queues only) Sorry was not able to get your thoughts here ... whats the difference you are trying to indicate between update and change configurations ? If Maximum-capacity for partitions is set to 100 what needs to be modified ? how is it different from the default max capacity configuration for the default partition ? I understand guaranteed capacity needs to be set to zero, but why max cap needs to be modified when its shared to few queues? > Scheduler Web Ui shows max capacity for the queue is 100% but when we submit > application doesnt get assigned > > > Key: YARN-4415 > URL: https://issues.apache.org/jira/browse/YARN-4415 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: App info with diagnostics info.png, > capacity-scheduler.xml, screenshot-1.png > > > Steps to reproduce the issue : > Scenario 1: > # Configure a queue(default) with accessible node labels as * > # create a exclusive partition *xxx* and map a NM to it > # ensure no capacities are configured for default for label xxx > # start an RM app with queue as default and label as xxx > # application is stuck but scheduler ui shows 100% as max capacity for that > queue > Scenario 2: > # create a nonexclusive partition *sharedPartition* and map a NM to it > # ensure no capacities are configured for default queue > # start an RM app with queue as *default* and label as *sharedPartition* > # application is stuck but scheduler ui shows 100% as max capacity for that > queue for *sharedPartition* > For both issues cause is the same default max capacity and abs max capacity > is set to Zero % -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4416: Description: While debugging in eclipse came across a scenario where in i had to get to know the name of the queue but every time i tried to see the queue it was getting hung. On seeing the stack realized there was a deadlock but on analysis found out that it was only due to *queue.toString()* during debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. Hence we need to ensure following : # queueCapacity, resource-usage has their own read/write lock. # was: While debugging in eclipse came across a scenario where in i had to get to know the name of the queue but every time i tried to see the queue it was getting hung. On seeing the stack realized there was a deadlock but on analysis found out that it was only due to *queue.toString()* during debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. Hence we need to ensure following : > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock. > # -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051426#comment-15051426 ] Li Lu commented on YARN-3623: - Agree. +1 for the change. > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, > YARN-3623-2015-12-09.patch, YARN-3623-addendum.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4350) TestDistributedShell fails
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052333#comment-15052333 ] Sangjin Lee commented on YARN-4350: --- I pushed that commit. Let me know if you need anything else. Thanks. > TestDistributedShell fails > -- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-4350-feature-YARN-2928.001.patch > > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
[ https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051132#comment-15051132 ] Hadoop QA commented on YARN-4399: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 28s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 28s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 30s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 30s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 32s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 30s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 35s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 18m 45s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12776824/YARN-4399.002.patch | | JIRA Issue | YARN-4399 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux
[jira] [Commented] (YARN-3623) We should have a config to indicate the Timeline Service version
[ https://issues.apache.org/jira/browse/YARN-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051133#comment-15051133 ] Hudson commented on YARN-3623: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #682 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/682/]) YARN-3623. Add a config to indicate the Timeline Service version. (junping_du: rev f910e4f639dc311fcb257bfcb869b1aa8b2c0643) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt > We should have a config to indicate the Timeline Service version > > > Key: YARN-3623 > URL: https://issues.apache.org/jira/browse/YARN-3623 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-3623-2015-11-19.1.patch, YARN-3623-2015-12-09.patch > > > So far RM, MR AM, DA AM added/changed new config to enable the feature to > write the timeline data to v2 server. It's good to have a YARN > timeline-service.version config like timeline-service.enable to indicate the > version of the running timeline service with the given YARN cluster. It's > beneficial for users to more smoothly move from v1 to v2, as they don't need > to change the existing config, but switch this config from v1 to v2. And each > framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4442) create new application operation from the Rest API should be logged in the audit log
[ https://issues.apache.org/jira/browse/YARN-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051136#comment-15051136 ] Varun Vasudev commented on YARN-4442: - Any reason why it should be logged in the REST API but not for the RPC API? > create new application operation from the Rest API should be logged in the > audit log > > > Key: YARN-4442 > URL: https://issues.apache.org/jira/browse/YARN-4442 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > create new application operation from the Rest API > ("/ws/v1/cluster/apps/new-application") should be logged in > the audit log -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4350) TestDistributedShell fails
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052179#comment-15052179 ] Sangjin Lee commented on YARN-4350: --- You mean commit [f114e728da6e19f3d35ff0cfef9fceea26aa5d46|https://github.com/apache/hadoop/commit/f114e728da6e19f3d35ff0cfef9fceea26aa5d46] specifically? > TestDistributedShell fails > -- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-4350-feature-YARN-2928.001.patch > > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS
[ https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052251#comment-15052251 ] Akira AJISAKA commented on YARN-3432: - bq. Rethinking this, it's better to change how to calculate availableMB in CapacityScheduler. availableMB should include reservedMB for consistency. or to change how to calculate availableMB in FairScheduler to exclude reservedMB. As [~skaterxu] commented, we need to take care of vcores as well. > Cluster metrics have wrong Total Memory when there is reserved memory on CS > --- > > Key: YARN-3432 > URL: https://issues.apache.org/jira/browse/YARN-3432 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Brahma Reddy Battula > Attachments: YARN-3432-002.patch, YARN-3432.patch > > > I noticed that when reservations happen when using the Capacity Scheduler, > the UI and web services report the wrong total memory. > For example. I have a 300GB of total memory in my cluster. I allocate 50 > and I reserve 10. The cluster metrics for total memory get reported as 290GB. > This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps > there is a difference between fair scheduler and capacity scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications
[ https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052256#comment-15052256 ] Mohammad Shahid Khan commented on YARN-4441: If an application is already finished, then the killing the application does does not make any sense. >From UI we can call the kill operation and that event is logged as below. And in the audit log we are getting the success message for the operation Kill Application Request is successful. in the below code the problem is there at the first if check. Suppose application_xyz is already finished and when the kill request will come the target state will be KILLED and actual will be finished, the first if check will be true. Second if check also will be true as the target is KILLED and killApp method will be called and same will log the USER=dr.whoOPERATION=Kill Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_xyz | RMAuditLogger.java:91 USER=dr.whoOPERATION=Kill Application Request TARGET=RMWebService RESULT=SUCCESS APPID=application_xyz | MAuditLogger.java:91 {Code} if (!app.getState().toString().equals(targetState.getState())) { // user is attempting to change state. right we only // allow users to kill the app if (targetState.getState().equals(YarnApplicationState.KILLED.toString())) { return killApp(app, callerUGI, hsr); } throw new BadRequestException("Only '" + YarnApplicationState.KILLED.toString() + "' is allowed as a target state."); } {Code} I think we should not allow to call the killApp if the application is already finished. > Kill application request from the webservice(ui) is showing success even for > the finished applications > -- > > Key: YARN-4441 > URL: https://issues.apache.org/jira/browse/YARN-4441 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > If the application is already finished ie either failled, killed, or succeded > the kill operation should not be logged as success. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4194) Extend Reservation Definition Langauge (RDL) extensions to support node labels
[ https://issues.apache.org/jira/browse/YARN-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Tumanov updated YARN-4194: - Attachment: YARN-4194-v2.patch My patch test returns success: +1 overall __ < Success! > -- | Vote | Subsystem | Runtime | Comment | 0 | pre-patch | 21m 6s| Pre-patch trunk compilation is | | || healthy. | +1 |@author | 0m 0s | The patch does not contain any | | || @author tags. | +1 | tests included | 0m 0s | The patch appears to include 2 new | | || or modified test files. | +1 | javac | 8m 56s| There were no new javac warning | | || messages. | +1 |javadoc | 11m 13s | There were no new javadoc warning | | || messages. | +1 | release audit | 0m 23s| The applied patch does not increase | | || the total number of release audit | | || warnings. | +1 | checkstyle | 2m 3s | There were no new checkstyle | | || issues. | +1 | whitespace | 0m 1s | The patch has no lines that end in | | || whitespace. | +1 |install | 1m 40s| mvn install still works. | +1 |eclipse:eclipse | 0m 45s| The patch built with | | || eclipse:eclipse. | +1 | findbugs | 3m 20s| The patch does not introduce any | | || new Findbugs (version 3.0.0) | | || warnings. | | | 49m 28s | || Subsystem || Report/Notes || | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 50edcb9 | | Java | 1.7.0_91 | > Extend Reservation Definition Langauge (RDL) extensions to support node labels > -- > > Key: YARN-4194 > URL: https://issues.apache.org/jira/browse/YARN-4194 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Alexey Tumanov > Attachments: YARN-4194-v1.patch, YARN-4194-v2.patch > > > This JIRA tracks changes to the APIs to the reservation system to support > the expressivity of node-labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent
[ https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052270#comment-15052270 ] Akira AJISAKA commented on YARN-4358: - +1 pending Jenkins. Thanks Arun, Carlo, and Subru. bq. Interestingly for both me and him the mvn package works just fine (maybe different JDK?). Only JDK8 hits this issue. > Improve relationship between SharingPolicy and ReservationAgent > --- > > Key: YARN-4358 > URL: https://issues.apache.org/jira/browse/YARN-4358 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.4.patch, > YARN-4358.addendum-2.patch, YARN-4358.addendum.patch, YARN-4358.patch > > > At the moment an agent places based on available resources, but has no > visibility to extra constraints imposed by the SharingPolicy. While not all > constraints are easily represented some (e.g., max-instantaneous resources) > are easily represented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Yarn Running Application Logs
Hi, I want to collect running yarn application logs, when I try # yarn logs -applicationId it is giving error "application has not completed logs are only available after an application completes yarn" but I can see the logs through resourcemanager webui Anybody can help me how to collect the logs in a file ? -- Sincere Regards, A.Kishore Kumar, Ph: +91 9246274575
[jira] [Reopened] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run
[ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev reopened YARN-3998: - Re-opening - I think this is a useful feature for YARN to support. > Add retry-times to let NM re-launch container when it fails to run > -- > > Key: YARN-3998 > URL: https://issues.apache.org/jira/browse/YARN-3998 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jun Gong >Assignee: Jun Gong > > I'd like to add a field(retry-times) in ContainerLaunchContext. When AM > launches containers, it could specify the value. Then NM will re-launch the > container 'retry-times' times when it fails to run(e.g.exit code is not 0). > It will save a lot of time. It avoids container localization. RM does not > need to re-schedule the container. And local files in container's working > directory will be left for re-use.(If container have downloaded some big > files, it does not need to re-download them when running again.) > We find it is useful in systems like Storm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051043#comment-15051043 ] Hadoop QA commented on YARN-4340: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 10 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 2s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 2s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 16s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s {color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 53s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 51s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 19m 34s {color} | {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 13, now 13). {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 24s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 28m 59s {color} | {color:red} root-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 729, now 729). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 24s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 0s {color} | {color:red} Patch generated 1 new checkstyle issues in root (total was 353, now 352). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 29s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 43s {color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 48s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_91 with JDK v1.7.0_91 generated 3 new issues (was 0, now 3). {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 48s {color} | {color:red}
[jira] [Commented] (YARN-3876) get_executable() assumes everything is Linux
[ https://issues.apache.org/jira/browse/YARN-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051038#comment-15051038 ] Alan Burlison commented on YARN-3876: - In addition {{get_executable()}} returns a buffer that's never freed by its caller, so there is a memory leak, > get_executable() assumes everything is Linux > > > Key: YARN-3876 > URL: https://issues.apache.org/jira/browse/YARN-3876 > Project: Hadoop YARN > Issue Type: Sub-task > Components: build >Affects Versions: 2.7.0 >Reporter: Alan Burlison > > get_executable() in container-executor.c is non-portable and is hard-coded to > assume Linux's /proc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4422) Generic AHS sometimes doesn't show started, node, or logs on App page
[ https://issues.apache.org/jira/browse/YARN-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051061#comment-15051061 ] Eric Payne commented on YARN-4422: -- bq. Thanks! Will this fix address MAPREDUCE-5502 or MAPREDUCE-4428? It doesn't seem so, but would like to confirm. [~mingma], thanks for your interest. No, this JIRA does not fix the issue documented in MAPREDUCE-5502 or MAPREDUCE-4428. This JIRA only affects the Generic application history server's GUI and not the RM Application GUI. Also, as documented in those JIRAs, the problem is not a missing link in the GUI, but that the log history is missing altogether. > Generic AHS sometimes doesn't show started, node, or logs on App page > - > > Key: YARN-4422 > URL: https://issues.apache.org/jira/browse/YARN-4422 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 3.0.0, 2.8.0, 2.7.3 > > Attachments: AppAttemptPage no container or node.jpg, AppPage no logs > or node.jpg, YARN-4422.001.patch > > > Sometimes the AM container for an app isn't able to start the JVM. This can > happen if bogus JVM options are given to the AM container ( > {{-Dyarn.app.mapreduce.am.command-opts=-InvalidJvmOption}}) or when > misconfiguring the AM container's environment variables > ({{-Dyarn.app.mapreduce.am.env="JAVA_HOME=/foo/bar/baz}}) > When the AM container for an app isn't able to start the JVM, the Application > page for that application shows {{N/A}} for the {{Started}}, {{Node}}, and > {{Logs}} columns. It _does_ have links for each app attempt, and if you click > on one of them, you go to the Application Attempt page, where you can see all > containers with links to their logs and nodes, including the AM container. > But none of that shows up for the app attempts on the Application page. > Also, on the Application Attempt page, in the {{Application Attempt > Overview}} section, the {{AM Container}} value is {{null}} and the {{Node}} > value is {{N/A}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run
[ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051024#comment-15051024 ] Varun Vasudev commented on YARN-3998: - [~hex108] - do you wish to work on this? Do you have a patch that adds support for this? > Add retry-times to let NM re-launch container when it fails to run > -- > > Key: YARN-3998 > URL: https://issues.apache.org/jira/browse/YARN-3998 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jun Gong >Assignee: Jun Gong > > I'd like to add a field(retry-times) in ContainerLaunchContext. When AM > launches containers, it could specify the value. Then NM will re-launch the > container 'retry-times' times when it fails to run(e.g.exit code is not 0). > It will save a lot of time. It avoids container localization. RM does not > need to re-schedule the container. And local files in container's working > directory will be left for re-use.(If container have downloaded some big > files, it does not need to re-download them when running again.) > We find it is useful in systems like Storm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
Lin Yiqun created YARN-4440: --- Summary: FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time Key: YARN-4440 URL: https://issues.apache.org/jira/browse/YARN-4440 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} method {code} // default level is NODE_LOCAL if (! allowedLocalityLevel.containsKey(priority)) { allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL); return NodeType.NODE_LOCAL; } {code} If you first invoke this method, it doesn't init time in lastScheduledContainer and this will lead to execute these code for next invokation: {code} // check waiting time long waitTime = currentTimeMs; if (lastScheduledContainer.containsKey(priority)) { waitTime -= lastScheduledContainer.get(priority); } else { waitTime -= getStartTime(); } {code} the waitTime will subtract to FsApp startTime, and this will be easily more than the delay time and allowedLocality degrade. Because FsApp startTime will be start earlier than currentTimeMs. So we should add the initial time of priority to prevent comparing with FsApp startTime and allowedLocalityLevel degrade. And this problem will have more negative influence for small-jobs. The YARN-4399 also discuss some problem in aspect of locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
[ https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4440: Attachment: YARN-4440.001.patch > FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time > - > > Key: YARN-4440 > URL: https://issues.apache.org/jira/browse/YARN-4440 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4440.001.patch > > > It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} > method > {code} > // default level is NODE_LOCAL > if (! allowedLocalityLevel.containsKey(priority)) { > allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL); > return NodeType.NODE_LOCAL; > } > {code} > If you first invoke this method, it doesn't init time in > lastScheduledContainer and this will lead to execute these code for next > invokation: > {code} > // check waiting time > long waitTime = currentTimeMs; > if (lastScheduledContainer.containsKey(priority)) { > waitTime -= lastScheduledContainer.get(priority); > } else { > waitTime -= getStartTime(); > } > {code} > the waitTime will subtract to FsApp startTime, and this will be easily more > than the delay time and allowedLocality degrade. Because FsApp startTime will > be start earlier than currentTimeMs. So we should add the initial time of > priority to prevent comparing with FsApp startTime and allowedLocalityLevel > degrade. And this problem will have more negative influence for small-jobs. > The YARN-4399 also discuss some problem in aspect of locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
[ https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051072#comment-15051072 ] Hadoop QA commented on YARN-4440: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s {color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 32s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 27s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 18m 32s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12776819/YARN-4440.001.patch | | JIRA Issue | YARN-4440 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux
[jira] [Assigned] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications
[ https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-4441: - Assignee: Sunil G (was: Mohammad Shahid Khan) > Kill application request from the webservice(ui) is showing success even for > the finished applications > -- > > Key: YARN-4441 > URL: https://issues.apache.org/jira/browse/YARN-4441 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mohammad Shahid Khan >Assignee: Sunil G > > If the application is already finished ie either failled, killed, or succeded > the kill operation should not be logged as success. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4442) create new application operation from the Rest API should be logged in the audit log
Mohammad Shahid Khan created YARN-4442: -- Summary: create new application operation from the Rest API should be logged in the audit log Key: YARN-4442 URL: https://issues.apache.org/jira/browse/YARN-4442 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.0.0 Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan create new application operation from the Rest API ("/ws/v1/cluster/apps/new-application") should be logged in the audit log -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications
[ https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051117#comment-15051117 ] Sunil G commented on YARN-4441: --- Hi [~mohdshahidkhan] This is intentional. If application state is Final State stored, we will not return any error as its going to finished/completed immediately. I am not really sure this is what you were also expecting. Kindly share details otherwise. > Kill application request from the webservice(ui) is showing success even for > the finished applications > -- > > Key: YARN-4441 > URL: https://issues.apache.org/jira/browse/YARN-4441 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > If the application is already finished ie either failled, killed, or succeded > the kill operation should not be logged as success. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI
[ https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051121#comment-15051121 ] Kuhu Shukla commented on YARN-3102: --- Following the discussion from YARN-4402: Given that we consider exclude list as canonical truth of decomm-ed nodes, which means during serviceInit, {{setDecomissionedNMsMetrics}} call is kept as is, the only way to have these nodes be part of inactiveNodes map, which today gets reinitialized to a new empty concurrent map in {{RMActiveServiceContext}} during startup, is to have node hostnames/ips read in from exclude list and added to this map even though we lose the port information. This is because the node would ideally not have the NM process running and we don't keep that state across RM restarts. What that means is, we add a (NodeId,RMNode) entry where the hostname is legal but the ports are a defined invalid value like -1. This allows us to track the nodes that were decommissioned in the previous life cycle of the RM. We can also tweak the GUI to display N/A when the port is -1. Since the check of {{isValidNode}} is only on the basis of hostname/ip , this does not affect the rejoining behavior of the node. Requesting [~eepayne] for comments and ideas. > Decommisioned Nodes not listed in Web UI > > > Key: YARN-3102 > URL: https://issues.apache.org/jira/browse/YARN-3102 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 > Environment: 2 Node Manager and 1 Resource Manager >Reporter: Bibin A Chundatt >Assignee: Kuhu Shukla >Priority: Minor > > Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to > yarn.exlude file In RM1 machine > Add Yarn.exclude with NM1 Host Name > Start the node as listed below NM1,NM2 Resource manager > Now check Nodes decommisioned in /cluster/nodes > Number of decommisioned node is listed as 1 but Table is empty in > /cluster/nodes/decommissioned (detail of Decommision node not shown) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications
Mohammad Shahid Khan created YARN-4441: -- Summary: Kill application request from the webservice(ui) is showing success even for the finished applications Key: YARN-4441 URL: https://issues.apache.org/jira/browse/YARN-4441 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan If the application is already finished ie either failled, killed, or succeded the kill operation should not be logged as success. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
[ https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4399: Attachment: YARN-4399.002.patch Resolve the comile error. > FairScheduler allocated container should resetSchedulingOpportunities count > of its priority > --- > > Key: YARN-4399 > URL: https://issues.apache.org/jira/browse/YARN-4399 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4399.001.patch, YARN-4399.002.patch > > > There is a bug on fairScheduler allocating containers when you configurate > the locality configs.When you attempt to assigned a container,it will invoke > {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned > successfully or not. And if you configurate the > yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value > will influence the locality of containers.Because if one container is > assigned successfully and its priority schedulingOpportunity count will be > increased, and second container will be increased again.This will may be let > their priority of allowedLocality degrade. And this will let this container > dealt by rackRequest. So I think in fairScheduler allocating container, if > the previous container was dealt, its priority of schedulerCount should be > reset to 0, and don't let its value influence container's allocating in next > iteration and this will increased the locality of containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications
[ https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4441: -- Assignee: Mohammad Shahid Khan (was: Sunil G) > Kill application request from the webservice(ui) is showing success even for > the finished applications > -- > > Key: YARN-4441 > URL: https://issues.apache.org/jira/browse/YARN-4441 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > If the application is already finished ie either failled, killed, or succeded > the kill operation should not be logged as success. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3876) get_executable() assumes everything is Linux
[ https://issues.apache.org/jira/browse/YARN-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Burlison reassigned YARN-3876: --- Assignee: Alan Burlison > get_executable() assumes everything is Linux > > > Key: YARN-3876 > URL: https://issues.apache.org/jira/browse/YARN-3876 > Project: Hadoop YARN > Issue Type: Sub-task > Components: build >Affects Versions: 2.7.0 >Reporter: Alan Burlison >Assignee: Alan Burlison > > get_executable() in container-executor.c is non-portable and is hard-coded to > assume Linux's /proc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
[ https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4399: Attachment: YARN-4399.003.patch Resolve the remained compile error > FairScheduler allocated container should resetSchedulingOpportunities count > of its priority > --- > > Key: YARN-4399 > URL: https://issues.apache.org/jira/browse/YARN-4399 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4399.001.patch, YARN-4399.002.patch, > YARN-4399.003.patch > > > There is a bug on fairScheduler allocating containers when you configurate > the locality configs.When you attempt to assigned a container,it will invoke > {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned > successfully or not. And if you configurate the > yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value > will influence the locality of containers.Because if one container is > assigned successfully and its priority schedulingOpportunity count will be > increased, and second container will be increased again.This will may be let > their priority of allowedLocality degrade. And this will let this container > dealt by rackRequest. So I think in fairScheduler allocating container, if > the previous container was dealt, its priority of schedulerCount should be > reset to 0, and don't let its value influence container's allocating in next > iteration and this will increased the locality of containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4100) Add Documentation for Distributed Node Labels feature
[ https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051180#comment-15051180 ] Naganarasimha G R commented on YARN-4100: - Hi [~dian.fu], [~devaraj.k], [~wangda], & [~rohithsharma], Can one/all of you guys take a look at the documentation update in this jira, it would be better to go as part of 2.8.0 as the features are already checked in. > Add Documentation for Distributed Node Labels feature > - > > Key: YARN-4100 > URL: https://issues.apache.org/jira/browse/YARN-4100 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodeLabel.html, YARN-4100.v1.001.patch > > > Add Documentation for Distributed Node Labels feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature
[ https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4100: Summary: Add Documentation for Distributed and Delegated-Centralized Node Labels feature (was: Add Documentation for Distributed Node Labels feature) > Add Documentation for Distributed and Delegated-Centralized Node Labels > feature > --- > > Key: YARN-4100 > URL: https://issues.apache.org/jira/browse/YARN-4100 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodeLabel.html, YARN-4100.v1.001.patch > > > Add Documentation for Distributed Node Labels feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051186#comment-15051186 ] Hadoop QA commented on YARN-3226: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s {color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} Patch generated 3 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 105, now 106). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 28s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 16s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 149m 22s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |