[jira] [Assigned] (YARN-6075) Yarn top for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-6075: -- Assignee: Yufei Gu > Yarn top for FairScheduler > -- > > Key: YARN-6075 > URL: https://issues.apache.org/jira/browse/YARN-6075 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Prabhu Joseph >Assignee: Yufei Gu > Attachments: Yarn_Top_FairScheduler.png > > > Yarn top output for FairScheduler shows empty values. (attached output) We > need to handle yarn top with FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument
[ https://issues.apache.org/jira/browse/YARN-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814173#comment-15814173 ] Hudson commented on YARN-6073: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11096 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11096/]) YARN-6073. Misuse of format specifier in Preconditions.checkArgument (templedf: rev 6332a318bc1e2e9d73d7159eab26347bb3f1f9b3) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java > Misuse of format specifier in Preconditions.checkArgument > - > > Key: YARN-6073 > URL: https://issues.apache.org/jira/browse/YARN-6073 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yuanbo Liu >Priority: Trivial > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6073.001.patch > > > RMAdminCLI.java > {code} > int nLabels = map.get(nodeId).size(); > Preconditions.checkArgument(nLabels <= 1, "%d labels specified on > host=%s" > + ", please note that we do not support specifying multiple" > + " labels on a single host for now.", nLabels, nodeIdStr); > {code} > The {{%d}} should be replaced with {{%s}}, per > https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814163#comment-15814163 ] zhangyubiao edited comment on YARN-5995 at 1/10/17 7:17 AM: Thanks [~sunilg] . IMO, Histogram belong com.codahale.metrics package , MutableCounter belong org.apache.hadoop.metrics2.lib package,If we use these two metrics counter .it will mess up the metrics lib. So,I think it's better to write a MutableHistogram,MutableTimeHistogram reference HBase's MutableHistogram,MutableTimeHistogram. What do you thought,[~sunilg] ? was (Author: piaoyu zhang): Thanks [~sunilg] . IMO, Histogram belong com.codahale.metrics package , MutableCounter belong org.apache.hadoop.metrics2.lib package,If we use these two metrics counter .it will mess up the metrics lib. So,I think it's better to write a MutableHistogram,MutableTimeHistogram reference HBase's MutableHistogram,MutableTimeHistogram. What do you thought,[~sunilg] ? > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814163#comment-15814163 ] zhangyubiao commented on YARN-5995: --- Thanks [~sunilg] . IMO, Histogram belong com.codahale.metrics package , MutableCounter belong org.apache.hadoop.metrics2.lib package,If we use these two metrics counter .it will mess up the metrics lib. So,I think it's better to write a MutableHistogram,MutableTimeHistogram reference HBase's MutableHistogram,MutableTimeHistogram. What do you thought,[~sunilg] ? > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814127#comment-15814127 ] Hadoop QA commented on YARN-6072: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 75 unchanged - 0 fixed = 76 total (was 75) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 40m 52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 63m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6072 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846492/YARN-6072.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux a3c667d13668 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 945db55 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/14616/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/14616/artifact/patchprocess/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14616/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output |
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814049#comment-15814049 ] Sunil G commented on YARN-5995: --- Yes. Thanks for clarifying [~jianhe] So Ideally we can support - #total write ops - #total failed ops - time cost of each write op (average) - write latency First two are counter, hence we can use MutableCounter. Latter two may suit well with histogram. [~piaoyu zhang], pls share your thoughts. > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument
[ https://issues.apache.org/jira/browse/YARN-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814032#comment-15814032 ] Yuanbo Liu commented on YARN-6073: -- [~templedf] Thanks for your review. > Misuse of format specifier in Preconditions.checkArgument > - > > Key: YARN-6073 > URL: https://issues.apache.org/jira/browse/YARN-6073 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yuanbo Liu >Priority: Trivial > Attachments: YARN-6073.001.patch > > > RMAdminCLI.java > {code} > int nLabels = map.get(nodeId).size(); > Preconditions.checkArgument(nLabels <= 1, "%d labels specified on > host=%s" > + ", please note that we do not support specifying multiple" > + " labels on a single host for now.", nLabels, nodeIdStr); > {code} > The {{%d}} should be replaced with {{%s}}, per > https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated YARN-6072: -- Attachment: YARN-6072.01.branch-2.patch > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.patch, YARN-6072.01.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During resource manager service start() .EmbeddedElector starts first and > invokes {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} > happens after {{ActiveStandbyElectorBasedElectorService}} service start is
[jira] [Updated] (YARN-6022) Revert changes of AbstractResourceRequest
[ https://issues.apache.org/jira/browse/YARN-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-6022: --- Attachment: (was: YARN-6022.branch-2.006.patch) > Revert changes of AbstractResourceRequest > - > > Key: YARN-6022 > URL: https://issues.apache.org/jira/browse/YARN-6022 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-6022.001.patch, YARN-6022.002.patch, > YARN-6022.003.patch, YARN-6022.004.patch, YARN-6022.005.patch, > YARN-6022.branch-2.005.patch, YARN-6022.branch-2.006.patch > > > YARN-5774 added AbstractResourceRequest to make easier internal scheduler > change, this is not a correct approach: For example, with this change, we > need to make AbstractResourceRequest to be public/stable. And end users can > use it like: > {code} > AbstractResourceRequest request = ... > request.setCapability(...) > {code} > But AbstractResourceRequest should not be visible by application at all. > We need to revert it from branch-2.8 / branch-2 / trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6022) Revert changes of AbstractResourceRequest
[ https://issues.apache.org/jira/browse/YARN-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-6022: --- Attachment: YARN-6022.branch-2.006.patch > Revert changes of AbstractResourceRequest > - > > Key: YARN-6022 > URL: https://issues.apache.org/jira/browse/YARN-6022 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-6022.001.patch, YARN-6022.002.patch, > YARN-6022.003.patch, YARN-6022.004.patch, YARN-6022.005.patch, > YARN-6022.branch-2.005.patch, YARN-6022.branch-2.006.patch > > > YARN-5774 added AbstractResourceRequest to make easier internal scheduler > change, this is not a correct approach: For example, with this change, we > need to make AbstractResourceRequest to be public/stable. And end users can > use it like: > {code} > AbstractResourceRequest request = ... > request.setCapability(...) > {code} > But AbstractResourceRequest should not be visible by application at all. > We need to revert it from branch-2.8 / branch-2 / trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument
[ https://issues.apache.org/jira/browse/YARN-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814006#comment-15814006 ] Daniel Templeton commented on YARN-6073: Patch looks good to me. +1 > Misuse of format specifier in Preconditions.checkArgument > - > > Key: YARN-6073 > URL: https://issues.apache.org/jira/browse/YARN-6073 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yuanbo Liu >Priority: Trivial > Attachments: YARN-6073.001.patch > > > RMAdminCLI.java > {code} > int nLabels = map.get(nodeId).size(); > Preconditions.checkArgument(nLabels <= 1, "%d labels specified on > host=%s" > + ", please note that we do not support specifying multiple" > + " labels on a single host for now.", nLabels, nodeIdStr); > {code} > The {{%d}} should be replaced with {{%s}}, per > https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6062) nodemanager memory leak
[ https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813955#comment-15813955 ] Varun Saxena commented on YARN-6062: Is NM recovery enabled ? > nodemanager memory leak > --- > > Key: YARN-6062 > URL: https://issues.apache.org/jira/browse/YARN-6062 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: gehaijiang > Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 8986 data 20 0 21.3g 19g 7376 S 5.5 20.7 2458:09 java > 38432 data 20 0 9.8g 7.9g 6300 S 95.5 8.4 35273:23 java > 6653 data 20 0 4558m 3.4g 10m S 9.2 3.6 6640:37 java > $ jps > 6653 NodeManager > Nodemanager memory has been up,Reach 10G。 > nodemanager yarn-env.sh configure (2G) > YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m > -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6008) Fetch container list for failed application attempt
[ https://issues.apache.org/jira/browse/YARN-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813936#comment-15813936 ] David Yan commented on YARN-6008: - Thanks, I'm just a YARN user, not exactly a dev, at least not yet. :) > Fetch container list for failed application attempt > --- > > Key: YARN-6008 > URL: https://issues.apache.org/jira/browse/YARN-6008 > Project: Hadoop YARN > Issue Type: Bug > Environment: hadoop version 2.6 >Reporter: Priyanka Gugale > > When we run command "yarn container -list" on using failed application > attempt we should either get containers from that attempt or get a back list > as containers are no longer in running state. > Steps to reproduce: > 1. Launch a yarn application. > 2. Kill app master, it tries to restart application with new attempt id. > 3. Now run yarn command, > yarn container -list > Where Application Attempt ID is of failed attempt, > it lists the container from next attempt which is in "RUNNING" state right > now. > Expected behavior: > It should return list of killed containers from attempt 1 or empty list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp
[ https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813915#comment-15813915 ] Tsuyoshi Ozawa commented on YARN-3774: -- Thanks Jordan for the notification! I think we should use 3.3.0, 2.12.0 or later. > ZKRMStateStore should use Curator 3.0 and avail CuratorOp > - > > Key: YARN-3774 > URL: https://issues.apache.org/jira/browse/YARN-3774 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > > YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are > somewhat involved, and could be improved using CuratorOp introduced in > Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version > and make this change. > Curator is considering shading guava through CURATOR-200. In Hadoop 3, we > should upgrade to the next Curator version. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813903#comment-15813903 ] Jian He commented on YARN-5995: --- sorry, I meant the external tool such as (Ambari Metrics Server) can store these metrics as long as RM emits it in the ideal way.. I don't actually mean to store these metrics in this jira. > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated YARN-6072: -- Attachment: YARN-6072.01.patch attaching patch for trunk. will update for branch 2 and 2.8 shortly > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During resource manager service start() .EmbeddedElector starts first and > invokes {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} > happens after
[jira] [Commented] (YARN-6062) nodemanager memory leak
[ https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813788#comment-15813788 ] Bibin A Chundatt commented on YARN-6062: [~gehaijiang] Could you please have a look at [YARN-6017|https://issues.apache.org/jira/browse/YARN-6017?focusedCommentId=15812167=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15812167] Is it possible to change the JRE version and check by any chance. > nodemanager memory leak > --- > > Key: YARN-6062 > URL: https://issues.apache.org/jira/browse/YARN-6062 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: gehaijiang > Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 8986 data 20 0 21.3g 19g 7376 S 5.5 20.7 2458:09 java > 38432 data 20 0 9.8g 7.9g 6300 S 95.5 8.4 35273:23 java > 6653 data 20 0 4558m 3.4g 10m S 9.2 3.6 6640:37 java > $ jps > 6653 NodeManager > Nodemanager memory has been up,Reach 10G。 > nodemanager yarn-env.sh configure (2G) > YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m > -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813739#comment-15813739 ] Sunil G commented on YARN-5995: --- Thank [~jianhe] for chipping in. I also agree about the read op. We can skip them for now. As mentioned, i think we can redefine the list of metrics as below. - #total write operations - Time series metric to store the cost of each write op (time duration taken for each operation). Histogram will suit here I think. - Write Latency (Data written for each op) But I am also thinking in similar line. At time t1 if these datas are stored as (#total write op, time cost of write ops, data written in last time interval), then we can store same info at time t2 also. If we store all these datas in time series way, then it may make external modules job easy. But its a complex metric module to design internally. Do you whether we need to jump to this immediately, or can do something like 3 metrics which I mentioned earlier. Thoughts [~jianhe]. > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3955) Support for application priority ACLs in queues of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813710#comment-15813710 ] Sunil G commented on YARN-3955: --- Thank you [~leftnoteasy] for the review and commit. Thanks [~jianhe] for additional reviews. > Support for application priority ACLs in queues of CapacityScheduler > > > Key: YARN-3955 > URL: https://issues.apache.org/jira/browse/YARN-3955 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: ApplicationPriority-ACL.pdf, > ApplicationPriority-ACLs-v2.pdf, YARN-3955.0001.patch, YARN-3955.0002.patch, > YARN-3955.0003.patch, YARN-3955.0004.patch, YARN-3955.0005.patch, > YARN-3955.0006.patch, YARN-3955.0007.patch, YARN-3955.0008.patch, > YARN-3955.0009.patch, YARN-3955.0010.patch, YARN-3955.v0.patch, > YARN-3955.v1.patch, YARN-3955.wip1.patch > > > Support will be added for User-level access permission to use different > application-priorities. This is to avoid situations where all users try > running max priority in the cluster and thus degrading the value of > priorities. > Access Control Lists can be set per priority level within each queue. Below > is an example configuration that can be added in capacity scheduler > configuration > file for each Queue level. > yarn.scheduler.capacity.root...acl=user1,user2 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813705#comment-15813705 ] Sunil G commented on YARN-5889: --- Thank You Eric. Yes. I missed this. Will update in next patch. > Improve user-limit calculation in capacity scheduler > > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5889.0001.patch, > YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, > YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with > a write lock. To improve performance, this tickets is focussing on moving > user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy
[ https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813673#comment-15813673 ] Vrushali C commented on YARN-5980: -- Thanks Sangjin! Will make these changes. Table specific coprocessor requires dynamic loading of coprocessors to work for classes starting with org.apache.hadoop. That was (fixed) allowed only in hbase 1.2.x, so we should use that now. Let me confirm that this works fine and I will update the patch further to include all changes. > Update documentation for single node hbase deploy > - > > Key: YARN-5980 > URL: https://issues.apache.org/jira/browse/YARN-5980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-5980.001.patch, YARN-5980.002.patch, > YARN-5980.003.patch > > > Per HBASE-17272, a single node hbase deployment (single jvm running daemons + > hdfs writes) will be added to hbase shortly. > We should update the timeline service documentation in the setup/deployment > context accordingly, this will help users who are a bit wary of hbase > deployments help get started with timeline service more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM
[ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813658#comment-15813658 ] Hudson commented on YARN-4148: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11095 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11095/]) YARN-4148. When killing app, RM releases app's resource before they are (junping_du: rev 945db55f2e6521d33d4f90bbb09179b0feba5e7a) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java > When killing app, RM releases app's resource before they are released by NM > --- > > Key: YARN-4148 > URL: https://issues.apache.org/jira/browse/YARN-4148 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jason Lowe > Attachments: YARN-4148.001.patch, YARN-4148.002.patch, > YARN-4148.003.patch, YARN-4148.wip.patch, > free_in_scheduler_but_not_node_prototype-branch-2.7.patch > > > When killing a app, RM scheduler releases app's resource as soon as possible, > then it might allocate these resource for new requests. But NM have not > released them at that time. > The problem was found when we supported GPU as a resource(YARN-4122). Test > environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 > GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. > But when B tried to start container on NM, NM found it didn't have 3 GPUs to > allocate because it had not released A's GPUs. > I think the problem also exists for CPU/Memory. It might cause OOM when > memory is overused. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM
[ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813644#comment-15813644 ] Junping Du commented on YARN-4148: -- I just commit the 003 patch to trunk and branch-2. For branch-2.8, there are several conflicts there. Hi [~jlowe], do we want this commit get landed in branch-2.8? If so, can you put a patch here for branch-2.8? Thx! > When killing app, RM releases app's resource before they are released by NM > --- > > Key: YARN-4148 > URL: https://issues.apache.org/jira/browse/YARN-4148 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jason Lowe > Attachments: YARN-4148.001.patch, YARN-4148.002.patch, > YARN-4148.003.patch, YARN-4148.wip.patch, > free_in_scheduler_but_not_node_prototype-branch-2.7.patch > > > When killing a app, RM scheduler releases app's resource as soon as possible, > then it might allocate these resource for new requests. But NM have not > released them at that time. > The problem was found when we supported GPU as a resource(YARN-4122). Test > environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 > GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. > But when B tried to start container on NM, NM found it didn't have 3 GPUs to > allocate because it had not released A's GPUs. > I think the problem also exists for CPU/Memory. It might cause OOM when > memory is overused. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6075) Yarn top for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-6075: --- Component/s: (was: resourcemanager) > Yarn top for FairScheduler > -- > > Key: YARN-6075 > URL: https://issues.apache.org/jira/browse/YARN-6075 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Prabhu Joseph > Attachments: Yarn_Top_FairScheduler.png > > > Yarn top output for FairScheduler shows empty values. (attached output) We > need to handle yarn top with FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6062) nodemanager memory leak
[ https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813562#comment-15813562 ] gehaijiang commented on YARN-6062: -- pid 6653 process restart. Cluster nodemanager have memory leaks > nodemanager memory leak > --- > > Key: YARN-6062 > URL: https://issues.apache.org/jira/browse/YARN-6062 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: gehaijiang > Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 8986 data 20 0 21.3g 19g 7376 S 5.5 20.7 2458:09 java > 38432 data 20 0 9.8g 7.9g 6300 S 95.5 8.4 35273:23 java > 6653 data 20 0 4558m 3.4g 10m S 9.2 3.6 6640:37 java > $ jps > 6653 NodeManager > Nodemanager memory has been up,Reach 10G。 > nodemanager yarn-env.sh configure (2G) > YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m > -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6062) nodemanager memory leak
[ https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813556#comment-15813556 ] gehaijiang commented on YARN-6062: -- java version "1.7.0_65" Java(TM) SE Runtime Environment (build 1.7.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) > nodemanager memory leak > --- > > Key: YARN-6062 > URL: https://issues.apache.org/jira/browse/YARN-6062 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: gehaijiang > Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 8986 data 20 0 21.3g 19g 7376 S 5.5 20.7 2458:09 java > 38432 data 20 0 9.8g 7.9g 6300 S 95.5 8.4 35273:23 java > 6653 data 20 0 4558m 3.4g 10m S 9.2 3.6 6640:37 java > $ jps > 6653 NodeManager > Nodemanager memory has been up,Reach 10G。 > nodemanager yarn-env.sh configure (2G) > YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m > -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6011) Add a new web service to list the files on a container in AHSWebService
[ https://issues.apache.org/jira/browse/YARN-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813533#comment-15813533 ] Junping Du commented on YARN-6011: -- Thanks [~xgong] for delivering the patch. A couple of comments: 1. For generating URI embed in response of getContainerLogsInfo, I saw some very similar one in other places, like: getLogs(). Can we refactor the code a bit to reuse the same logic? 2. For getContainerLogsInfo(), if an app is not in running or finished state, here we will return bad request. However, I remember in our ATS implementation, RM after restart could send regressioned application state event to ATS, like app creation event to ATS which was running before. Can you double check ATS's app status won't have regression? Otherwise, we shouldn't just simply return a bad request. 3. For getContainerLogMeta(), I remember I have some previous comments on refactor code (consolidate similar logic, especially log reader) in previous JIRAs. How's going with that effort? If that effort is not a short term priory for you, please add a TODO here - may be someone else read this part of code could help on that. Other looks good to me. > Add a new web service to list the files on a container in AHSWebService > --- > > Key: YARN-6011 > URL: https://issues.apache.org/jira/browse/YARN-6011 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-6011.1.patch, YARN-6011.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5937) stop-yarn.sh is not able to gracefully stop node managers
[ https://issues.apache.org/jira/browse/YARN-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813490#comment-15813490 ] Weiwei Yang commented on YARN-5937: --- Perfect, thank you [~Naganarasimha] :) > stop-yarn.sh is not able to gracefully stop node managers > - > > Key: YARN-5937 > URL: https://issues.apache.org/jira/browse/YARN-5937 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Labels: script > Attachments: YARN-5937.01.patch, nm_shutdown.log > > > stop-yarn.sh always gives following output > {code} > ./sbin/stop-yarn.sh > Stopping resourcemanager > Stopping nodemanagers > : WARNING: nodemanager did not stop gracefully after 5 seconds: > Trying to kill with kill -9 > : ERROR: Unable to kill 18097 > {code} > this was because resource manager is stopped before node managers, when the > shutdown hook manager tries to gracefully stop NM services, NM needs to > unregister with RM, and it gets timeout as NM could not connect to RM > (already stopped). See log (stop RM then run kill ) > {code} > 16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM > ... > 16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook > 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException > java.util.concurrent.TimeoutException > at java.util.concurrent.FutureTask.get(FutureTask.java:205) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67) > ... > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291) > ... > 16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown > forcefully. > {code} > the shutdown hooker has a default of 10s timeout, so if RM is stopped before > NMs, they always took more than 10s to stop (in java code). However > stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped. > It would make sense to stop NMs before RMs in this script, in a graceful way. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy
[ https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813393#comment-15813393 ] Sangjin Lee commented on YARN-5980: --- Thanks [~vrushalic] for your patch! Most of the comments are minor (except the last one). - l.184: it might be better if the URL is enclosed in parenthesis: "for your setup (http://...)." - l.191: "on standalone HBase" -> "on the standalone HBase setup" - l.192: "they persist" -> "it persists" - l.194: for consistency, let's use the fixed width font for hbase-site.xml: `hbase-site.xml` - l.195: likewise, the hbase.rootdir -> the `hbase.rootdir` property - l.196: likewise, hbase.cluster.distributed -> `hbase.cluster.distributed` - l.212: let's add a period at the end - l.234: Have we changed the coprocessor to a dynamic one? I don't think we made that change (yet)? > Update documentation for single node hbase deploy > - > > Key: YARN-5980 > URL: https://issues.apache.org/jira/browse/YARN-5980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-5980.001.patch, YARN-5980.002.patch, > YARN-5980.003.patch > > > Per HBASE-17272, a single node hbase deployment (single jvm running daemons + > hdfs writes) will be added to hbase shortly. > We should update the timeline service documentation in the setup/deployment > context accordingly, this will help users who are a bit wary of hbase > deployments help get started with timeline service more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6076) Backport YARN-4752 (FS preemption changes) to branch-2
[ https://issues.apache.org/jira/browse/YARN-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813388#comment-15813388 ] Hadoop QA commented on YARN-6076: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 41s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 22s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 25s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 32s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 23s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 37s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 46s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 13 new + 173 unchanged - 145 fixed = 186 total (was 318) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 25s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 21s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_111. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_111 with JDK v1.8.0_111 generated 0 new + 913 unchanged - 8 fixed = 913 total (was 921) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 42s{color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_121. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 25s{color} | {color:red}
[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy
[ https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813380#comment-15813380 ] Hadoop QA commented on YARN-5980: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-5980 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846457/YARN-5980.003.patch | | Optional Tests | asflicense mvnsite | | uname | Linux a619eb2f90fc 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 603cbcd | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14615/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Update documentation for single node hbase deploy > - > > Key: YARN-5980 > URL: https://issues.apache.org/jira/browse/YARN-5980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-5980.001.patch, YARN-5980.002.patch, > YARN-5980.003.patch > > > Per HBASE-17272, a single node hbase deployment (single jvm running daemons + > hdfs writes) will be added to hbase shortly. > We should update the timeline service documentation in the setup/deployment > context accordingly, this will help users who are a bit wary of hbase > deployments help get started with timeline service more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers
[ https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813375#comment-15813375 ] Wangda Tan commented on YARN-5764: -- Thanks [~devaraj.k] for updating design doc and patch, some questions/comments: 1) What is the benefit to manually specify NUMA node? Since this is potentially complex for end user to specify, I think it's better to directly read data from OS. 2) Does the changes work on platform other than Linux? 3) I'm not quite sure about if this could happen: with this patch, YARN will launch process one by one on each NUMA node to bind memory/cpu. Is it possible that there's another process (outside of YARN) uses memory of NUMA node which causes processes launched by YARN failed to bind or run? 4) This patch uses hard binding (get allocated resource on specified node or fail), is it better to specify soft binding (prefer to allocate and can also accept other node). I think soft binding should be default behavior to support NUMA. Thoughts? > NUMA awareness support for launching containers > --- > > Key: YARN-5764 > URL: https://issues.apache.org/jira/browse/YARN-5764 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Reporter: Olasoji >Assignee: Devaraj K > Attachments: NUMA Awareness for YARN Containers.pdf, > YARN-5764-v0.patch, YARN-5764-v1.patch > > > The purpose of this feature is to improve Hadoop performance by minimizing > costly remote memory accesses on non SMP systems. Yarn containers, on launch, > will be pinned to a specific NUMA node and all subsequent memory allocations > will be served by the same node, reducing remote memory accesses. The current > default behavior is to spread memory across all NUMA nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5980) Update documentation for single node hbase deploy
[ https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-5980: - Attachment: YARN-5980.003.patch Uploading 003 that fixes the whitespace at the end of line as reported by Hadoop QA. > Update documentation for single node hbase deploy > - > > Key: YARN-5980 > URL: https://issues.apache.org/jira/browse/YARN-5980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-5980.001.patch, YARN-5980.002.patch, > YARN-5980.003.patch > > > Per HBASE-17272, a single node hbase deployment (single jvm running daemons + > hdfs writes) will be added to hbase shortly. > We should update the timeline service documentation in the setup/deployment > context accordingly, this will help users who are a bit wary of hbase > deployments help get started with timeline service more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy
[ https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813313#comment-15813313 ] Hadoop QA commented on YARN-5980: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 58s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-5980 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846451/YARN-5980.002.patch | | Optional Tests | asflicense mvnsite | | uname | Linux 217c144f598e 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 603cbcd | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/14613/artifact/patchprocess/whitespace-eol.txt | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14613/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Update documentation for single node hbase deploy > - > > Key: YARN-5980 > URL: https://issues.apache.org/jira/browse/YARN-5980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-5980.001.patch, YARN-5980.002.patch > > > Per HBASE-17272, a single node hbase deployment (single jvm running daemons + > hdfs writes) will be added to hbase shortly. > We should update the timeline service documentation in the setup/deployment > context accordingly, this will help users who are a bit wary of hbase > deployments help get started with timeline service more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813302#comment-15813302 ] Junping Du commented on YARN-6072: -- [~jianhe]'s comments are pretty persuasive. I will wait this issue get resolved before kicking off 2.8.0 RC. [~ajithshetty], as [~ka...@cloudera.com] mentioned above, please let us know your plan and we can help to take over if you have other priorities. Thanks! > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During
[jira] [Updated] (YARN-6022) Revert changes of AbstractResourceRequest
[ https://issues.apache.org/jira/browse/YARN-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-6022: --- Attachment: YARN-6022.branch-2.006.patch Oops. Screwed up patch5. Hopefully this works better. > Revert changes of AbstractResourceRequest > - > > Key: YARN-6022 > URL: https://issues.apache.org/jira/browse/YARN-6022 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-6022.001.patch, YARN-6022.002.patch, > YARN-6022.003.patch, YARN-6022.004.patch, YARN-6022.005.patch, > YARN-6022.branch-2.005.patch, YARN-6022.branch-2.006.patch > > > YARN-5774 added AbstractResourceRequest to make easier internal scheduler > change, this is not a correct approach: For example, with this change, we > need to make AbstractResourceRequest to be public/stable. And end users can > use it like: > {code} > AbstractResourceRequest request = ... > request.setCapability(...) > {code} > But AbstractResourceRequest should not be visible by application at all. > We need to revert it from branch-2.8 / branch-2 / trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5980) Update documentation for single node hbase deploy
[ https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-5980: - Attachment: YARN-5980.002.patch Uploading patch 002. This updates the hbase version as well as makes the steps a bit more clearer I think. > Update documentation for single node hbase deploy > - > > Key: YARN-5980 > URL: https://issues.apache.org/jira/browse/YARN-5980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-5980.001.patch, YARN-5980.002.patch > > > Per HBASE-17272, a single node hbase deployment (single jvm running daemons + > hdfs writes) will be added to hbase shortly. > We should update the timeline service documentation in the setup/deployment > context accordingly, this will help users who are a bit wary of hbase > deployments help get started with timeline service more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads
[ https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813265#comment-15813265 ] Hadoop QA commented on YARN-6061: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 16s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 21s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 23m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6061 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846437/YARN-6061.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux e4b5d55dccba 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 91bf504 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/14611/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14611/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14611/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add a customized uncaughtexceptionhandler for critical threads > -- > > Key: YARN-6061 > URL: https://issues.apache.org/jira/browse/YARN-6061 > Project: Hadoop YARN > Issue Type: Improvement > Components:
[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue
[ https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813260#comment-15813260 ] Daniel Templeton commented on YARN-5554: Sorry. Looks like I started repeating myself. TMJ! (Too Many JIRAs!) Fine by me. +1. I'll commit when I get a chance. > MoveApplicationAcrossQueues does not check user permission on the target queue > -- > > Key: YARN-5554 > URL: https://issues.apache.org/jira/browse/YARN-5554 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Haibo Chen >Assignee: Wilfred Spiegelenburg > Labels: oct16-medium > Attachments: YARN-5554.10.patch, YARN-5554.11.patch, > YARN-5554.12.patch, YARN-5554.13.patch, YARN-5554.14.patch, > YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, > YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch > > > moveApplicationAcrossQueues operation currently does not check user > permission on the target queue. This incorrectly allows one user to move > his/her own applications to a queue that the user has no access to -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6076) Backport YARN-4752 (FS preemption changes) to branch-2
[ https://issues.apache.org/jira/browse/YARN-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-6076: --- Attachment: yarn-6076-branch-2.1.patch > Backport YARN-4752 (FS preemption changes) to branch-2 > -- > > Key: YARN-6076 > URL: https://issues.apache.org/jira/browse/YARN-6076 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-6076-branch-2.1.patch > > > YARN-4752 was merged to trunk a while ago, and has been stable. Creating this > JIRA to merge it branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3427) Remove deprecated methods from ResourceCalculatorProcessTree
[ https://issues.apache.org/jira/browse/YARN-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3427: --- Target Version/s: 3.0.0-beta1 (was: ) Priority: Blocker (was: Major) Hadoop Flags: Incompatible change > Remove deprecated methods from ResourceCalculatorProcessTree > > > Key: YARN-3427 > URL: https://issues.apache.org/jira/browse/YARN-3427 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Blocker > > In 2.7, we made ResourceCalculatorProcessTree Public and exposed some > existing ill-formed methods as deprecated ones for use by Tez. > We should remove it in 3.0.0, considering that the methods have been > deprecated for the all 2.x.y releases that it is marked Public in. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-453) Optimize FS internal data structures
[ https://issues.apache.org/jira/browse/YARN-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-453: -- Assignee: (was: Karthik Kambatla) > Optimize FS internal data structures > > > Key: YARN-453 > URL: https://issues.apache.org/jira/browse/YARN-453 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Karthik Kambatla > > FS uses lists to store internal queues and sorts them on every heartbeat > leading to unnecessary scheduling overhead. > Choice of better data structures should reduce the latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-809) Enable better parallelism in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-809: -- Assignee: (was: Karthik Kambatla) > Enable better parallelism in the Fair Scheduler > --- > > Key: YARN-809 > URL: https://issues.apache.org/jira/browse/YARN-809 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sandy Ryza > > Currently, the Fair Scheduler is locked on pretty much every operation, node > updates, application additions and removals, every time the update thread > runs, and every time the RM queries it for information. Most of this locking > is unnecessary, especially as only the core scheduling operations like > application additions, removals, and node updates need a consistent view of > scheduler state. > We can probably increase parallelism by using concurrent data structures when > applicable, as well as keeping a slightly stale view to serve via the RM > APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM
[ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813168#comment-15813168 ] Junping Du commented on YARN-4148: -- Yes, I agree that the two test failures are not related to the patch. Thanks [~jlowe] for reminding me. Committing it now. > When killing app, RM releases app's resource before they are released by NM > --- > > Key: YARN-4148 > URL: https://issues.apache.org/jira/browse/YARN-4148 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jason Lowe > Attachments: YARN-4148.001.patch, YARN-4148.002.patch, > YARN-4148.003.patch, YARN-4148.wip.patch, > free_in_scheduler_but_not_node_prototype-branch-2.7.patch > > > When killing a app, RM scheduler releases app's resource as soon as possible, > then it might allocate these resource for new requests. But NM have not > released them at that time. > The problem was found when we supported GPU as a resource(YARN-4122). Test > environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 > GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. > But when B tried to start container on NM, NM found it didn't have 3 GPUs to > allocate because it had not released A's GPUs. > I think the problem also exists for CPU/Memory. It might cause OOM when > memory is overused. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads
[ https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813135#comment-15813135 ] Yufei Gu commented on YARN-6061: Had an offline discussion with [~kasha]. We agreed to enlarge the scope to yarn project. A new patch uploaded. > Add a customized uncaughtexceptionhandler for critical threads > -- > > Key: YARN-6061 > URL: https://issues.apache.org/jira/browse/YARN-6061 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6061.001.patch > > > There are several threads in fair scheduler. The thread will quit when there > is a runtime exception inside it. We should bring down the RM when that > happens. Otherwise, there may be some weird behavior in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads
[ https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6061: --- Attachment: YARN-6061.001.patch > Add a customized uncaughtexceptionhandler for critical threads > -- > > Key: YARN-6061 > URL: https://issues.apache.org/jira/browse/YARN-6061 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6061.001.patch > > > There are several threads in fair scheduler. The thread will quit when there > is a runtime exception inside it. We should bring down the RM when that > happens. Otherwise, there may be some weird behavior in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6061) Add a customized uncaughtexceptionhandler for critical thread
[ https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6061: --- Summary: Add a customized uncaughtexceptionhandler for critical thread (was: Add a customized uncaughtexceptionhandler for fair scheduler) > Add a customized uncaughtexceptionhandler for critical thread > - > > Key: YARN-6061 > URL: https://issues.apache.org/jira/browse/YARN-6061 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Yufei Gu >Assignee: Yufei Gu > > There are several threads in fair scheduler. The thread will quit when there > is a runtime exception inside it. We should bring down the RM when that > happens. Otherwise, there may be some weird behavior in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads
[ https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6061: --- Summary: Add a customized uncaughtexceptionhandler for critical threads (was: Add a customized uncaughtexceptionhandler for critical thread) > Add a customized uncaughtexceptionhandler for critical threads > -- > > Key: YARN-6061 > URL: https://issues.apache.org/jira/browse/YARN-6061 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Yufei Gu >Assignee: Yufei Gu > > There are several threads in fair scheduler. The thread will quit when there > is a runtime exception inside it. We should bring down the RM when that > happens. Otherwise, there may be some weird behavior in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6061) Add a customized uncaughtexceptionhandler for fair scheduler
[ https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6061: --- Component/s: (was: fairscheduler) > Add a customized uncaughtexceptionhandler for fair scheduler > > > Key: YARN-6061 > URL: https://issues.apache.org/jira/browse/YARN-6061 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Yufei Gu >Assignee: Yufei Gu > > There are several threads in fair scheduler. The thread will quit when there > is a runtime exception inside it. We should bring down the RM when that > happens. Otherwise, there may be some weird behavior in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6061) Add a customized uncaughtexceptionhandler for fair scheduler
[ https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6061: --- Labels: (was: fairscheduler) > Add a customized uncaughtexceptionhandler for fair scheduler > > > Key: YARN-6061 > URL: https://issues.apache.org/jira/browse/YARN-6061 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, yarn >Reporter: Yufei Gu >Assignee: Yufei Gu > > There are several threads in fair scheduler. The thread will quit when there > is a runtime exception inside it. We should bring down the RM when that > happens. Otherwise, there may be some weird behavior in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM
[ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813092#comment-15813092 ] Jason Lowe commented on YARN-4148: -- The unit test failures appear to be unrelated. They pass for me locally with the patch applied, and there are JIRAs that are tracking those failures. The TestDelegationTokenRenewer failure is being tracked by YARN-5816 and the TestRMRestart failure is tracked by YARN-5548. Thanks for the review, [~djp]! If you agree the failures are unrelated then feel free to commit, or I'll do so in a few days unless I hear otherwise. > When killing app, RM releases app's resource before they are released by NM > --- > > Key: YARN-4148 > URL: https://issues.apache.org/jira/browse/YARN-4148 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jason Lowe > Attachments: YARN-4148.001.patch, YARN-4148.002.patch, > YARN-4148.003.patch, YARN-4148.wip.patch, > free_in_scheduler_but_not_node_prototype-branch-2.7.patch > > > When killing a app, RM scheduler releases app's resource as soon as possible, > then it might allocate these resource for new requests. But NM have not > released them at that time. > The problem was found when we supported GPU as a resource(YARN-4122). Test > environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 > GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. > But when B tried to start container on NM, NM found it didn't have 3 GPUs to > allocate because it had not released A's GPUs. > I think the problem also exists for CPU/Memory. It might cause OOM when > memory is overused. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813047#comment-15813047 ] Karthik Kambatla edited comment on YARN-6072 at 1/9/17 10:32 PM: - My vote would be to play it safe and fix it in 2.8.0. I am happy to review the changes. [~ajithshetty] - if you are unable to get to this in the next couple of days, please let me know so I can pick it up. was (Author: kasha): My vote would be to safe and fix it in 2.8.0. I am happy to review the changes. [~ajithshetty] - if you are unable to get to this in the next couple of days, please let me know so I can pick it up. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at >
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813047#comment-15813047 ] Karthik Kambatla commented on YARN-6072: My vote would be to safe and fix it in 2.8.0. I am happy to review the changes. [~ajithshetty] - if you are unable to get to this in the next couple of days, please let me know so I can pick it up. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During resource manager service start() .EmbeddedElector starts
[jira] [Updated] (YARN-5976) Update hbase version to 1.2 (removes phoenix dependencies)
[ https://issues.apache.org/jira/browse/YARN-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-5976: - Summary: Update hbase version to 1.2 (removes phoenix dependencies) (was: Update hbase version to 1.2) > Update hbase version to 1.2 (removes phoenix dependencies) > -- > > Key: YARN-5976 > URL: https://issues.apache.org/jira/browse/YARN-5976 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Fix For: 3.0.0-alpha2 > > Attachments: YARN-5976-YARN-5355.004.patch, YARN-5976.001.wip.patch, > YARN-5976.002.wip.patch, YARN-5976.004.patch > > > I believe phoenix now works with hbase 1.2. We should now upgrade timeline > service to use hbase 1.2 now. > And also update documentation in timelineservice to reflect that hbase mode > of all daemons in single jvm but writing to hdfs is supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6000) Make AllocationFileLoaderService.Listener public
[ https://issues.apache.org/jira/browse/YARN-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812994#comment-15812994 ] Sergey Shelukhin edited comment on YARN-6000 at 1/9/17 10:06 PM: - Thanks! As for Hive requirements, see the code snippet above. We are using the listener because that seems to be the only way to get the updated value out. We just need to get the allocConf that we use to get queuepolicy, and then get the queue was (Author: sershe): Thanks! As for Hive requirements, see the code snippet above. We are using the listener because that seems to be the only way to get the updated value out. We just need to get the allocConf/queue > Make AllocationFileLoaderService.Listener public > > > Key: YARN-6000 > URL: https://issues.apache.org/jira/browse/YARN-6000 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Affects Versions: 3.0.0-alpha1 >Reporter: Tao Jie >Assignee: Tao Jie > Fix For: 3.0.0-alpha2 > > Attachments: YARN-6000.001.patch > > > We removed public modifier of {{AllocationFileLoaderService.Listener}} in > YARN-4997 since it trigger a findbugs warning. However it breaks Hive code in > {{FairSchedulerShim}}. > {code} > AllocationFileLoaderService allocsLoader = new AllocationFileLoaderService(); > allocsLoader.init(conf); > allocsLoader.setReloadListener(new AllocationFileLoaderService.Listener() > { > @Override > public void onReload(AllocationConfiguration allocs) { > allocConf.set(allocs); > } > }); > try { > allocsLoader.reloadAllocations(); > } catch (Exception ex) { > throw new IOException("Failed to load queue allocations", ex); > } > if (allocConf.get() == null) { > allocConf.set(new AllocationConfiguration(conf)); > } > QueuePlacementPolicy queuePolicy = allocConf.get().getPlacementPolicy(); > if (queuePolicy != null) { > requestedQueue = queuePolicy.assignAppToQueue(requestedQueue, userName); > {code} > As a result we should set the modifier back to public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6000) Make AllocationFileLoaderService.Listener public
[ https://issues.apache.org/jira/browse/YARN-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812994#comment-15812994 ] Sergey Shelukhin commented on YARN-6000: Thanks! As for Hive requirements, see the code snippet above. We are using the listener because that seems to be the only way to get the updated value out. We just need to get the allocConf/queue > Make AllocationFileLoaderService.Listener public > > > Key: YARN-6000 > URL: https://issues.apache.org/jira/browse/YARN-6000 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Affects Versions: 3.0.0-alpha1 >Reporter: Tao Jie >Assignee: Tao Jie > Fix For: 3.0.0-alpha2 > > Attachments: YARN-6000.001.patch > > > We removed public modifier of {{AllocationFileLoaderService.Listener}} in > YARN-4997 since it trigger a findbugs warning. However it breaks Hive code in > {{FairSchedulerShim}}. > {code} > AllocationFileLoaderService allocsLoader = new AllocationFileLoaderService(); > allocsLoader.init(conf); > allocsLoader.setReloadListener(new AllocationFileLoaderService.Listener() > { > @Override > public void onReload(AllocationConfiguration allocs) { > allocConf.set(allocs); > } > }); > try { > allocsLoader.reloadAllocations(); > } catch (Exception ex) { > throw new IOException("Failed to load queue allocations", ex); > } > if (allocConf.get() == null) { > allocConf.set(new AllocationConfiguration(conf)); > } > QueuePlacementPolicy queuePolicy = allocConf.get().getPlacementPolicy(); > if (queuePolicy != null) { > requestedQueue = queuePolicy.assignAppToQueue(requestedQueue, userName); > {code} > As a result we should set the modifier back to public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812907#comment-15812907 ] Yufei Gu commented on YARN-4022: Hi [~forrestchen], are you still work on this? If not, can I take it? > queue not remove from webpage(/cluster/scheduler) when delete queue in > xxx-scheduler.xml > > > Key: YARN-4022 > URL: https://issues.apache.org/jira/browse/YARN-4022 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: forrestchen > Labels: oct16-medium, scheduler > Attachments: YARN-4022.001.patch, YARN-4022.002.patch, > YARN-4022.003.patch, YARN-4022.004.patch > > > When I delete an existing queue by modify the xxx-schedule.xml, I can still > see the queue information block in webpage(/cluster/scheduler) though the > 'Min Resources' items all become to zero and have no item of 'Max Running > Applications'. > I can still submit an application to the deleted queue and the application > will run using 'root.default' queue instead, but submit to an un-exist queue > will cause an exception. > My expectation is the deleted queue will not displayed in webpage and submit > application to the deleted queue will act just like the queue doesn't exist. > PS: There's no application running in the queue I delete. > Some related config in yarn-site.xml: > {code} > > yarn.scheduler.fair.user-as-default-queue > false > > > yarn.scheduler.fair.allow-undeclared-pools > false > > {code} > a related question is here: > http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812903#comment-15812903 ] Jian He commented on YARN-6072: --- YARN-5709 actually affected the sequence of start. Before YARN-5709, ActiveStandbyElector is created inside AdminService, so it is guaranteed that the server variable is instantiated before ActiveStandbyElector is started. After YARN-5709, this is not the case any more. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During
[jira] [Commented] (YARN-6057) yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior when a request is out of bounds
[ https://issues.apache.org/jira/browse/YARN-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812832#comment-15812832 ] Hadoop QA commented on YARN-6057: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 30s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 21m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6057 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846407/YARN-6057.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml | | uname | Linux 0ed77c2a909e 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 91bf504 | | Default Java | 1.8.0_111 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14610/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14610/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior > when a request is out of bounds > - > > Key: YARN-6057 > URL: https://issues.apache.org/jira/browse/YARN-6057 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Julia Sommer >Priority: Minor > Attachments: YARN-6057.001.patch, YARN-6057.002.patch > > > {code} > > The minimum allocation for every container request at the RM, > in terms of virtual CPU cores. Requests lower than this will throw a > InvalidResourceRequestException. > yarn.scheduler.minimum-allocation-vcores > 1 > > {code} > *Requests lower than this will throw a
[jira] [Updated] (YARN-6057) yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior when a request is out of bounds
[ https://issues.apache.org/jira/browse/YARN-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julia Sommer updated YARN-6057: --- Attachment: YARN-6057.002.patch Corrected version attached. Thanks! > yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior > when a request is out of bounds > - > > Key: YARN-6057 > URL: https://issues.apache.org/jira/browse/YARN-6057 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Julia Sommer >Priority: Minor > Attachments: YARN-6057.001.patch, YARN-6057.002.patch > > > {code} > > The minimum allocation for every container request at the RM, > in terms of virtual CPU cores. Requests lower than this will throw a > InvalidResourceRequestException. > yarn.scheduler.minimum-allocation-vcores > 1 > > {code} > *Requests lower than this will throw a InvalidResourceRequestException.* > Only incase of maximum allocation vcore and memory > InvalidResourceRequestException is thrown -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6077) /bin/bash path is hardcoded in node manager
[ https://issues.apache.org/jira/browse/YARN-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-6077: - Description: We need a configuration entry similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL to support multiple environments like FreeBSD. was: There should be a configuration similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL > /bin/bash path is hardcoded in node manager > --- > > Key: YARN-6077 > URL: https://issues.apache.org/jira/browse/YARN-6077 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi > > We need a configuration entry similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL > to support multiple environments like FreeBSD. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812737#comment-15812737 ] Eric Payne commented on YARN-5889: -- [~sunilg], bq. However, I do think {{reComputeUserLimits}} needs to be modified to update {{preComputedUserLimit}} when recomputing user limits. That is not done anywhare in the current patch. Actually, we may want to have separate methods for {{reComputeUserLimits}} and {{reComputeActiveUserLimits}} > Improve user-limit calculation in capacity scheduler > > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5889.0001.patch, > YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, > YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with > a write lock. To improve performance, this tickets is focussing on moving > user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6077) /bin/bash path is hardcoded in node manager
[ https://issues.apache.org/jira/browse/YARN-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812693#comment-15812693 ] Miklos Szegedi commented on YARN-6077: -- This bug was opened based on the discussion with [~aw] in YARN-6060. > /bin/bash path is hardcoded in node manager > --- > > Key: YARN-6077 > URL: https://issues.apache.org/jira/browse/YARN-6077 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi > > There should be a configuration similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec
[ https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812689#comment-15812689 ] Miklos Szegedi commented on YARN-6060: -- I opened HADOOP-13963 and YARN-6077. Since I am only familiar with Yarn, do you mind, if I pick YARN-6077? > Linux container executor fails to run container on directories mounted as > noexec > > > Key: YARN-6060 > URL: https://issues.apache.org/jira/browse/YARN-6060 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Reporter: Miklos Szegedi > Attachments: YARN-6060.000.patch, YARN-6060.001.patch > > > If node manager directories are mounted as noexec, LCE fails with the > following error: > Launching container... > Couldn't execute the container launch file > /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh > - Permission denied -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812676#comment-15812676 ] Jian He commented on YARN-5995: --- Actually, it can be most useful if this is a time series metrics that can be used by external framework to show the metrics over time - to show when RM incurs high write latencies, as we always do postmortem analysis. If so, we can merely output the absolute value of 'time cost for each store op' or 'amount of data written for each op', external tool can use this metrics to plot metrics over time. > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6057) yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior when a request is out of bounds
[ https://issues.apache.org/jira/browse/YARN-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-6057: --- Summary: yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior when a request is out of bounds (was: yarn.scheduler.minimum-allocation-* and yarn.scheduler.maximum-allocation-* descriptions are incorrect about behavior when a request is out of bounds) > yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior > when a request is out of bounds > - > > Key: YARN-6057 > URL: https://issues.apache.org/jira/browse/YARN-6057 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Julia Sommer >Priority: Minor > Attachments: YARN-6057.001.patch > > > {code} > > The minimum allocation for every container request at the RM, > in terms of virtual CPU cores. Requests lower than this will throw a > InvalidResourceRequestException. > yarn.scheduler.minimum-allocation-vcores > 1 > > {code} > *Requests lower than this will throw a InvalidResourceRequestException.* > Only incase of maximum allocation vcore and memory > InvalidResourceRequestException is thrown -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6077) /bin/bash path is hardcoded in node manager
Miklos Szegedi created YARN-6077: Summary: /bin/bash path is hardcoded in node manager Key: YARN-6077 URL: https://issues.apache.org/jira/browse/YARN-6077 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi There should be a configuration similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6057) yarn.scheduler.minimum-allocation-* and yarn.scheduler.maximum-allocation-* descriptions are incorrect about behavior when a request is out of bounds
[ https://issues.apache.org/jira/browse/YARN-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812637#comment-15812637 ] Daniel Templeton commented on YARN-6057: That's my bad. I foolishly assumed that the max and min would be used symmetrically and didn't track down use of the max value independently. :) [~Juliasommer], can you please drop the changes to the maximum values and post a new patch? > yarn.scheduler.minimum-allocation-* and yarn.scheduler.maximum-allocation-* > descriptions are incorrect about behavior when a request is out of bounds > - > > Key: YARN-6057 > URL: https://issues.apache.org/jira/browse/YARN-6057 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Julia Sommer >Priority: Minor > Attachments: YARN-6057.001.patch > > > {code} > > The minimum allocation for every container request at the RM, > in terms of virtual CPU cores. Requests lower than this will throw a > InvalidResourceRequestException. > yarn.scheduler.minimum-allocation-vcores > 1 > > {code} > *Requests lower than this will throw a InvalidResourceRequestException.* > Only incase of maximum allocation vcore and memory > InvalidResourceRequestException is thrown -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812626#comment-15812626 ] Naganarasimha G R commented on YARN-6072: - +1 to unblock for 2.8.0 > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During resource manager service start() .EmbeddedElector starts first and > invokes {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} > happens after {{ActiveStandbyElectorBasedElectorService}} service start is >
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812620#comment-15812620 ] Jian He commented on YARN-5995: --- As said earlier, read won't be that useful as it only happens on RM start up to load the data. It's a one-time value which dos not require metrics. IMO, We need to think about how the metrics can actually be used for performance analysis, that is, how much impact the store operation can affect RM's execution, i.e. how much delay it can incur. Metrics like data written per sec looks more like measuring ZK throughput which may not be that useful. I think what we need is to surface the time spent on write operation. With that in mind, we may have 1) a Histogram for the time spent for each write op ? 2) total no of write operations > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5937) stop-yarn.sh is not able to gracefully stop node managers
[ https://issues.apache.org/jira/browse/YARN-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812614#comment-15812614 ] Naganarasimha G R commented on YARN-5937: - Thanks [~cheersyang], Sorry for the delay just verified the trunk code, it was happening due to my trunk code, your approach is fine will commit it shortly. > stop-yarn.sh is not able to gracefully stop node managers > - > > Key: YARN-5937 > URL: https://issues.apache.org/jira/browse/YARN-5937 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Labels: script > Attachments: YARN-5937.01.patch, nm_shutdown.log > > > stop-yarn.sh always gives following output > {code} > ./sbin/stop-yarn.sh > Stopping resourcemanager > Stopping nodemanagers > : WARNING: nodemanager did not stop gracefully after 5 seconds: > Trying to kill with kill -9 > : ERROR: Unable to kill 18097 > {code} > this was because resource manager is stopped before node managers, when the > shutdown hook manager tries to gracefully stop NM services, NM needs to > unregister with RM, and it gets timeout as NM could not connect to RM > (already stopped). See log (stop RM then run kill ) > {code} > 16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM > ... > 16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook > 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException > java.util.concurrent.TimeoutException > at java.util.concurrent.FutureTask.get(FutureTask.java:205) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67) > ... > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291) > ... > 16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown > forcefully. > {code} > the shutdown hooker has a default of 10s timeout, so if RM is stopped before > NMs, they always took more than 10s to stop (in java code). However > stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped. > It would make sense to stop NMs before RMs in this script, in a graceful way. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6076) Backport YARN-4752 (FS preemption changes) to branch-2
Karthik Kambatla created YARN-6076: -- Summary: Backport YARN-4752 (FS preemption changes) to branch-2 Key: YARN-6076 URL: https://issues.apache.org/jira/browse/YARN-6076 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.8.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla YARN-4752 was merged to trunk a while ago, and has been stable. Creating this JIRA to merge it branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5556) Support for deleting queues without requiring a RM restart
[ https://issues.apache.org/jira/browse/YARN-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812579#comment-15812579 ] Hadoop QA commented on YARN-5556: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 6 new + 207 unchanged - 2 fixed = 213 total (was 209) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m 21s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 18s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-5556 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846311/YARN-5556.v2.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 452e00e08ed0 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 287d3d6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/14608/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/14608/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14608/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output |
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812546#comment-15812546 ] Junping Du commented on YARN-6072: -- bq. YARN-5333 and YARN-5988 are not available in 2.8.0. So issue shouldn't happen in 2.8.0 I see. Sounds like YARN-5333 is the root cause. However, someone said YARN-5709 could be related, but from my quick check, it doesn't affect sequence of service start. [~ka...@cloudera.com] and [~jianhe], can you confirm YARN-5709 is not related? If so, we can drop 2.8.0 from affected version and target version to unblock our 2.8.0 RC. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at >
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812539#comment-15812539 ] zhangyubiao commented on YARN-5995: --- Thanks. [~sunilg] > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5188) FairScheduler performance bug
[ https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812533#comment-15812533 ] zhangyubiao edited comment on YARN-5188 at 1/9/17 6:56 PM: --- [~chenfolin], It is ok for me to take over this JIRA? was (Author: piaoyu zhang): @ChenFolin it is ok for me to take over this JIRA? > FairScheduler performance bug > - > > Key: YARN-5188 > URL: https://issues.apache.org/jira/browse/YARN-5188 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.5.0 >Reporter: ChenFolin > Attachments: YARN-5188-1.patch > > > My Hadoop Cluster has recently encountered a performance problem. Details as > Follows. > There are two point which can cause this performance issue. > 1: application sort before assign container at FSLeafQueue. TreeSet is not > the best, Why not keep orderly ? and then we can use binary search to help > keep orderly when a application's resource usage has changed. > 2: queue sort and assignContainerPreCheck will lead to compute all leafqueue > resource usage ,Why can we store the leafqueue usage at memory and update it > when assign container op release container happen? > >The efficiency of assign container in the Resourcemanager may fall > when the number of running and pending application grows. And the fact is the > cluster has too many PendingMB or PengdingVcore , and the Cluster > current utilization rate may below 20%. >I checked the resourcemanager logs, I found that every assign > container may cost 5 ~ 10 ms, but just 0 ~ 1 ms at usual time. > >I use TestFairScheduler to reproduce the scene: > >Just one queue: root.defalut > 10240 apps. > >assign container avg time: 6753.9 us ( 6.7539 ms) > apps sort time (FSLeafQueue : Collections.sort(runnableApps, > comparator); ): 4657.01 us ( 4.657 ms ) > compute LeafQueue Resource usage : 905.171 us ( 0.905171 ms ) > > When just root.default, one assign container op contains : ( one apps > sort op ) + 2 * ( compute leafqueue usage op ) >According to the above situation, I think the assign container op has > a performance problem . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5188) FairScheduler performance bug
[ https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812533#comment-15812533 ] zhangyubiao commented on YARN-5188: --- @ChenFolin it is ok for me to take over this JIRA? > FairScheduler performance bug > - > > Key: YARN-5188 > URL: https://issues.apache.org/jira/browse/YARN-5188 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.5.0 >Reporter: ChenFolin > Attachments: YARN-5188-1.patch > > > My Hadoop Cluster has recently encountered a performance problem. Details as > Follows. > There are two point which can cause this performance issue. > 1: application sort before assign container at FSLeafQueue. TreeSet is not > the best, Why not keep orderly ? and then we can use binary search to help > keep orderly when a application's resource usage has changed. > 2: queue sort and assignContainerPreCheck will lead to compute all leafqueue > resource usage ,Why can we store the leafqueue usage at memory and update it > when assign container op release container happen? > >The efficiency of assign container in the Resourcemanager may fall > when the number of running and pending application grows. And the fact is the > cluster has too many PendingMB or PengdingVcore , and the Cluster > current utilization rate may below 20%. >I checked the resourcemanager logs, I found that every assign > container may cost 5 ~ 10 ms, but just 0 ~ 1 ms at usual time. > >I use TestFairScheduler to reproduce the scene: > >Just one queue: root.defalut > 10240 apps. > >assign container avg time: 6753.9 us ( 6.7539 ms) > apps sort time (FSLeafQueue : Collections.sort(runnableApps, > comparator); ): 4657.01 us ( 4.657 ms ) > compute LeafQueue Resource usage : 905.171 us ( 0.905171 ms ) > > When just root.default, one assign container op contains : ( one apps > sort op ) + 2 * ( compute leafqueue usage op ) >According to the above situation, I think the assign container op has > a performance problem . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy
[ https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812517#comment-15812517 ] Vrushali C commented on YARN-5980: -- Thanks [~varun_saxena] and [~sjlee0], yes I will update the patch. It's out dated actually, since we now use 1.2.4 of HBase. I will upload another patch shortly. > Update documentation for single node hbase deploy > - > > Key: YARN-5980 > URL: https://issues.apache.org/jira/browse/YARN-5980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-5980.001.patch > > > Per HBASE-17272, a single node hbase deployment (single jvm running daemons + > hdfs writes) will be added to hbase shortly. > We should update the timeline service documentation in the setup/deployment > context accordingly, this will help users who are a bit wary of hbase > deployments help get started with timeline service more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812498#comment-15812498 ] Hadoop QA commented on YARN-6054: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 43s{color} | {color:red} hadoop-yarn-server-applicationhistoryservice in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 22m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.timeline.webapp.TestTimelineWebServices | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6054 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846380/YARN-6054.03.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux a7d68c595185 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 287d3d6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/14609/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14609/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14609/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TimelineServer fails to start when some LevelDb state files are missing. > > > Key:
[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec
[ https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812488#comment-15812488 ] Allen Wittenauer commented on YARN-6060: bq. should we create a JIRA to make /bin/bash configurable in UnixShellScriptBuilder? Yes, please. /bin/bash is hard-coded in a few places in the Java code and they should all be changed to either be /usr/bin/env bash or pull from a system property/configuration entry so that the shell code can define where exactly bash is located on startup. > Linux container executor fails to run container on directories mounted as > noexec > > > Key: YARN-6060 > URL: https://issues.apache.org/jira/browse/YARN-6060 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Reporter: Miklos Szegedi > Attachments: YARN-6060.000.patch, YARN-6060.001.patch > > > If node manager directories are mounted as noexec, LCE fails with the > following error: > Launching container... > Couldn't execute the container launch file > /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh > - Permission denied -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812459#comment-15812459 ] Naganarasimha G R commented on YARN-6054: - Thanks for the patch [~raviprakashu], bq. Also, as pointed out by Jason, (e.g. in the case of NM) graceful degradation would be a very hard thing to achieve. More likely, the state is corrupt and will cause undefined behavior. Agree, but may be we can give some kind of tool and set of steps which can be taken to over come it as we too faced it once. but agree its not within this jira's scope ! Changes look good enough will wait for the jenkins report and if no further comments will commit it tomorrow ! > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec
[ https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi resolved YARN-6060. -- Resolution: Won't Fix Assignee: (was: Miklos Szegedi) > Linux container executor fails to run container on directories mounted as > noexec > > > Key: YARN-6060 > URL: https://issues.apache.org/jira/browse/YARN-6060 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Reporter: Miklos Szegedi > Attachments: YARN-6060.000.patch, YARN-6060.001.patch > > > If node manager directories are mounted as noexec, LCE fails with the > following error: > Launching container... > Couldn't execute the container launch file > /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh > - Permission denied -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812451#comment-15812451 ] Wangda Tan commented on YARN-5889: -- bq. Let's please do these as separate JIRAs. We are extremely anxious to move this JIRA forward since it is blocking YARN-2113 (user limit-based intra-queue preemption). I understand, however I prefer to do the refactoring together with the patch, If we don't actively do refactoring to make a clean code structure with major behavior changes, it will cause a lot of trouble to maintain the code and add new functionalities. I'm OK with moving other changes like partition-related changes to a separate JIRA if it needs considerable effort. > Improve user-limit calculation in capacity scheduler > > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5889.0001.patch, > YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, > YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with > a write lock. To improve performance, this tickets is focussing on moving > user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated YARN-6054: --- Attachment: YARN-6054.03.patch Here's a patch with the improvements suggested by Naganarasimha. > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812432#comment-15812432 ] Ravi Prakash commented on YARN-6054: Thanks Naganarasimha for your careful review! As I posted in the first comment, the repair did indeed fix the issue for us (we had a production incident.) As I'm sure you'll understand, we can't post the leveldb files in the open source. # I feel this JIRA is very specific to the TimelineServer so I am hesitant to include other daemons. Also, as pointed out by Jason, (e.g. in the case of NM) graceful degradation would be a very hard thing to achieve. More likely, the state is corrupt and will cause undefined behavior. # Fair point. Will do. # Great idea. Will do. > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch, YARN-6054.02.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec
[ https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812382#comment-15812382 ] Miklos Szegedi commented on YARN-6060: -- Thank you, [~vvasudev], [~aw] and [~templedf] for the comments. Since the error is a configuration error, I agree we should cancel the patch. [~aw], should we create a JIRA to make /bin/bash configurable in UnixShellScriptBuilder? > Linux container executor fails to run container on directories mounted as > noexec > > > Key: YARN-6060 > URL: https://issues.apache.org/jira/browse/YARN-6060 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi > Attachments: YARN-6060.000.patch, YARN-6060.001.patch > > > If node manager directories are mounted as noexec, LCE fails with the > following error: > Launching container... > Couldn't execute the container launch file > /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh > - Permission denied -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6064) Support fromId for flowRuns and flow/flowRun apps REST API's
[ https://issues.apache.org/jira/browse/YARN-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812379#comment-15812379 ] Sangjin Lee commented on YARN-6064: --- Thanks [~rohithsharma] for the patch! (ApplicationRowKeyPrefix.java) - I'm not sure if this change is necessary. By providing the app id, we're specifying a complete row key, not a partial prefix. And I think adding this constructor for the prefix makes things somewhat confusing. We should simply use the {{ApplicationRowKey}} instance instead to handle the fromId. (ApplicationEntityReader.java) - as mentioned above, for the start row let's use something like this: {code} ApplicationRowKey startRow = new ApplicationRowKey( context.getClusterId(), context.getUserId(), context.getFlowName(), flowRunId, getFilters().getFromId()); // set start row scan.setStartRow(startRow.getRowKey()); {code} (FlowRunRowKeyPrefix.java) - the same comment applies here (FlowRunEntityReader.java) - the same comment applies here In addition, I think many of the checkstyle, javadoc, and whitespace issues are actionable and easy to fix, including the following: {noformat} ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1005: verifyFlowEntites(client, uri, 3, new int[] { 3, 2, 1 },:52: '{' is followed by whitespace. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1006: new String[] { "flow1", "flow_name", "flow_name2" });:25: '{' is followed by whitespace. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1011: verifyFlowEntites(client, uri, 3, new int[] { 3, 2, 1 },:52: '{' is followed by whitespace. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1012: new String[] { "flow1", "flow_name", "flow_name2" });:25: '{' is followed by whitespace. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1016: verifyFlowEntites(client, uri, 1, new int[] { 3 },:52: '{' is followed by whitespace. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1017: new String[] { "flow1" });:25: '{' is followed by whitespace. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1027: verifyFlowEntites(client, uri, 3, new int[] { 3, 2, 1 },:52: '{' is followed by whitespace. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1028: new String[] { "flow1", "flow_name", "flow_name2" });:25: '{' is followed by whitespace. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1038: verifyFlowEntites(client, uri, 3, new int[] { 3, 2, 1 },:52: '{' is followed by whitespace. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1039: new String[] { "flow1", "flow_name", "flow_name2" });:25: '{' is followed by whitespace. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1044: verifyFlowEntites(client, uri, 3, new int[] { 3, 2, 1 },:52: '{' is followed by whitespace. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1045: new String[] { "flow1", "flow_name", "flow_name2"
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812321#comment-15812321 ] Bibin A Chundatt commented on YARN-6072: +1 from my side too. Tried the same on my local cluster seems to be working fine. [~ajithshetty] In addition to changing order please do update logging and exception thrown. Currently we are losing trace. {code} @@ -708,7 +708,7 @@ void refreshAll() throws ServiceFailedException { } refreshClusterMaxPriority(); } catch (Exception ex) { + LOG.error(ex); - throw new ServiceFailedException(ex.getMessage()); + throw new ServiceFailedException(ex.getMessage(), ex); } {code} > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at >
[jira] [Commented] (YARN-6074) FlowRunEntity does not deserialize long values correctly
[ https://issues.apache.org/jira/browse/YARN-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812308#comment-15812308 ] Sangjin Lee commented on YARN-6074: --- Thanks for catching this [~rohithsharma]. > FlowRunEntity does not deserialize long values correctly > > > Key: YARN-6074 > URL: https://issues.apache.org/jira/browse/YARN-6074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 3.0.0-alpha2, YARN-5355, YARN-5355-branch-2 > > Attachments: YARN-6074.patch > > > I see that FlowRunEntity methods *getRunId()* and *getMaxEndTime()* does not > deserialize in efficient way which causes class cast exception based on the > number. > {code} > public long getRunId() { > Object runId = getInfo().get(FLOW_RUN_ID_INFO_KEY); > return runId == null ? 0L : (Long) runId; > } > {code} > and > {code} > public long getMaxEndTime() { > Object time = getInfo().get(FLOW_RUN_END_TIME); > return time == null ? 0L : (Long)time; > } > {code} > The reason for class caste exception is Json has data type Number which > includes all java primitive types. So, if number with in the range of Integer > max, then Object is converted to Integer which fails to type cast to Long. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812251#comment-15812251 ] Eric Payne commented on YARN-5889: -- Thanks [~leftnoteasy] for your review and comments. bq. If everybody agrees with approach in my comment I agree. I think [~jlowe] summed it up well when he said your "... proposal ... preserves the FIFO/priority behavior at least up through usage < MULP and then becomes fair once a user is beyond MULP." {quote} And in addition, we can add a UsersManager to LQ ... And for last, we can look at logics for #active-users per partition, now we have an identical #active-users for all partition {quote} Let's please do these as separate JIRAs. We are extremely anxious to move this JIRA forward since it is blocking YARN-2113 (user limit-based intra-queue preemption). > Improve user-limit calculation in capacity scheduler > > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5889.0001.patch, > YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, > YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with > a write lock. To improve performance, this tickets is focussing on moving > user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3955) Support for application priority ACLs in queues of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812228#comment-15812228 ] Hudson commented on YARN-3955: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11090 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11090/]) YARN-3955. Support for application priority ACLs in queues of (wangda: rev 287d3d6804a869723ae36605a3c2d2b3eae3941e) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/AppPriorityACLsManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AccessType.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ACLsTestBase.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AppPriorityACLGroup.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerQueueManager.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriorityACLs.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java * (edit) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AppPriorityACLConfigurationParser.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriorityACLConfiguration.java > Support for application priority ACLs in queues of CapacityScheduler > > > Key: YARN-3955 > URL: https://issues.apache.org/jira/browse/YARN-3955 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.9.0,
[jira] [Commented] (YARN-6062) nodemanager memory leak
[ https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812206#comment-15812206 ] Bibin A Chundatt commented on YARN-6062: [~gehaijiang] Could you provide java version used > nodemanager memory leak > --- > > Key: YARN-6062 > URL: https://issues.apache.org/jira/browse/YARN-6062 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: gehaijiang > Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 8986 data 20 0 21.3g 19g 7376 S 5.5 20.7 2458:09 java > 38432 data 20 0 9.8g 7.9g 6300 S 95.5 8.4 35273:23 java > 6653 data 20 0 4558m 3.4g 10m S 9.2 3.6 6640:37 java > $ jps > 6653 NodeManager > Nodemanager memory has been up,Reach 10G。 > nodemanager yarn-env.sh configure (2G) > YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m > -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3955) Support for application priority ACLs in queues of CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3955: - Summary: Support for application priority ACLs in queues of CapacityScheduler (was: Support for priority ACLs in CapacityScheduler) > Support for application priority ACLs in queues of CapacityScheduler > > > Key: YARN-3955 > URL: https://issues.apache.org/jira/browse/YARN-3955 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: ApplicationPriority-ACL.pdf, > ApplicationPriority-ACLs-v2.pdf, YARN-3955.0001.patch, YARN-3955.0002.patch, > YARN-3955.0003.patch, YARN-3955.0004.patch, YARN-3955.0005.patch, > YARN-3955.0006.patch, YARN-3955.0007.patch, YARN-3955.0008.patch, > YARN-3955.0009.patch, YARN-3955.0010.patch, YARN-3955.v0.patch, > YARN-3955.v1.patch, YARN-3955.wip1.patch > > > Support will be added for User-level access permission to use different > application-priorities. This is to avoid situations where all users try > running max priority in the cluster and thus degrading the value of > priorities. > Access Control Lists can be set per priority level within each queue. Below > is an example configuration that can be added in capacity scheduler > configuration > file for each Queue level. > yarn.scheduler.capacity.root...acl=user1,user2 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812187#comment-15812187 ] Wangda Tan commented on YARN-5889: -- Thanks [~sunilg] for providing the patch and reviews from [~eepayne]. Quickly scanned the patch, If everybody agrees with approach in my comment: https://issues.apache.org/jira/browse/YARN-5889?focusedCommentId=15749129=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15749129 {code} active-user-limit = max( resource-used-by-active-users / #active-users, queue-capacity * MULP ) {code} We can: 1) Update {{resource-used-by-active-users}} when any user enters/exits set of active users. 2) Update #active-users (using approaches like ActiveUsersManager approach) Both of #1/#2 can be done within the same thread, and since we will have an identical UL for all the active users, I think this can be done without adding a new thread. Please correct me if I'm wrong. It looks like Sunil and Eric have different implementation suggestions, I have looked at details of the two approaches. However I think the goals are: a. Can always get up-to-date UL b. Avoid re-computation of UL as much as possible. Any approach can achieve above goals is good to me. And in addition, we can add a UsersManager to LQ to manager all user-related information, such as user-limit / active-user-limit / #active-users. Ideally, LQ/PCPP should get activeUserLimit / userLimit from UsersManager And for last, we can look at logics for #active-users per partition, now we have an identical #active-users for all partition, which causes some problems. Since we will make a major changes to logics around this, it could be a good chance to fix the problem as well. > Improve user-limit calculation in capacity scheduler > > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5889.0001.patch, > YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, > YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with > a write lock. To improve performance, this tickets is focussing on moving > user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6017) node manager physical memory leak
[ https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812167#comment-15812167 ] Bibin A Chundatt edited comment on YARN-6017 at 1/9/17 4:36 PM: [~chenrongwei] This could be because of JDK bug. Found the following link [JDK-8054841|http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8054841] during reading. As per the defect description {{ProcessBuilder leaks native memory}} . {{Shell.java}} do use {{ProcessBuilder}} for all child process launch {quote} For the Oracle JDK, it does not appear to be present in 7u45, but does appear to be present in 7u55. {quote} Looks like the issue is available in 7u65 . If possible could you try changing JRE version. [GITHUB|https://github.com/cprice404/jvm-processbuilder-leak/blob/master/README.md] {quote} For Oracle JDK, the leak does not appear to be present in 7u45, but does appear to be present in 7u55. (I believe, though I have less data, that the leak is not present in Oracle 7u51. The leak definitely appears to be present in Oracle 7u65 and 7u67 as well.) For OpenJDK, the leak does not appear to be present in 7u55, but does appear to be present in 7u65. {quote} was (Author: bibinchundatt): [~chenrongwei] This could be because of JDK bug. Found the following link [JDK-8054841|http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8054841] during reading. As per the defect description {{ProcessBuilder leaks native memory}} . {{Shell.java}} do use {{ProcessBuilder}} for all child process launch {quote} For the Oracle JDK, it does not appear to be present in 7u45, but does appear to be present in 7u55. {quote} Looks like the issue is available in 7u65 . If possible could you try changing JRE version. > node manager physical memory leak > - > > Key: YARN-6017 > URL: https://issues.apache.org/jira/browse/YARN-6017 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 > Environment: OS: > Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 > x86_64 x86_64 x86_64 GNU/Linux > jvm: > java version "1.7.0_65" > Java(TM) SE Runtime Environment (build 1.7.0_65-b17) > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) >Reporter: chenrongwei > Attachments: 31169_smaps.txt, 31169_smaps.txt > > > In our produce environment, node manager's jvm memory has been set to > '-Xmx2048m',but we notice that after a long time running the process' actual > physical memory size had been reached to 12g (we got this value by top > command as follow). > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 31169 data 20 0 13.2g 12g 6092 S 16.9 13.0 49183:13 java > 31169: /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m > -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dhadoop.log.file=yarn-data-nodemanager.log > -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data > -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA > -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native > -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M > -XX:+UseC > Address Kbytes Mode Offset DeviceMapping > 0040 4 r-x-- 008:1 java > 0060 4 rw--- 008:1 java > 00601000 10094936 rw--- 000:0 [ anon ] > 00077000 2228224 rw--- 000:0 [ anon ] > 0007f800 131072 rw--- 000:0 [ anon ] > 00325ee0 128 r-x-- 008:1 ld-2.12.so > 00325f01f000 4 r 0001f000 008:1 ld-2.12.so > 00325f02 4 rw--- 0002 008:1 ld-2.12.so > 00325f021000 4 rw--- 000:0 [ anon ] > 00325f201576 r-x-- 008:1 libc-2.12.so > 00325f38a0002048 - 0018a000 008:1 libc-2.12.so > 00325f58a000 16 r 0018a000 008:1 libc-2.12.so > 00325f58e000 4 rw--- 0018e000 008:1 libc-2.12.so > 00325f58f000 20 rw--- 000:0 [ anon ] > 00325f60 92 r-x-- 008:1 libpthread-2.12.so > 00325f6170002048 - 00017000 008:1 libpthread-2.12.so > 00325f817000 4 r 00017000 008:1 libpthread-2.12.so > 00325f818000 4 rw--- 00018000 008:1 libpthread-2.12.so > 00325f819000 16 rw--- 000:0 [ anon ] > 00325fa0 8 r-x-- 008:1 libdl-2.12.so > 00325fa020002048
[jira] [Commented] (YARN-6017) node manager physical memory leak
[ https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812167#comment-15812167 ] Bibin A Chundatt commented on YARN-6017: [~chenrongwei] This could be because of JDK bug. Found the following link [JDK-8054841|http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8054841] during reading. As per the defect description {{ProcessBuilder leaks native memory}} . {{Shell.java}} do use {{ProcessBuilder}} for all child process launch {quote} For the Oracle JDK, it does not appear to be present in 7u45, but does appear to be present in 7u55. {quote} Looks like the issue is available in 7u65 . If possible could you try changing JRE version. > node manager physical memory leak > - > > Key: YARN-6017 > URL: https://issues.apache.org/jira/browse/YARN-6017 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 > Environment: OS: > Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 > x86_64 x86_64 x86_64 GNU/Linux > jvm: > java version "1.7.0_65" > Java(TM) SE Runtime Environment (build 1.7.0_65-b17) > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) >Reporter: chenrongwei > Attachments: 31169_smaps.txt, 31169_smaps.txt > > > In our produce environment, node manager's jvm memory has been set to > '-Xmx2048m',but we notice that after a long time running the process' actual > physical memory size had been reached to 12g (we got this value by top > command as follow). > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 31169 data 20 0 13.2g 12g 6092 S 16.9 13.0 49183:13 java > 31169: /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m > -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs > -Dhadoop.log.file=yarn-data-nodemanager.log > -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data > -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA > -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native > -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M > -XX:+UseC > Address Kbytes Mode Offset DeviceMapping > 0040 4 r-x-- 008:1 java > 0060 4 rw--- 008:1 java > 00601000 10094936 rw--- 000:0 [ anon ] > 00077000 2228224 rw--- 000:0 [ anon ] > 0007f800 131072 rw--- 000:0 [ anon ] > 00325ee0 128 r-x-- 008:1 ld-2.12.so > 00325f01f000 4 r 0001f000 008:1 ld-2.12.so > 00325f02 4 rw--- 0002 008:1 ld-2.12.so > 00325f021000 4 rw--- 000:0 [ anon ] > 00325f201576 r-x-- 008:1 libc-2.12.so > 00325f38a0002048 - 0018a000 008:1 libc-2.12.so > 00325f58a000 16 r 0018a000 008:1 libc-2.12.so > 00325f58e000 4 rw--- 0018e000 008:1 libc-2.12.so > 00325f58f000 20 rw--- 000:0 [ anon ] > 00325f60 92 r-x-- 008:1 libpthread-2.12.so > 00325f6170002048 - 00017000 008:1 libpthread-2.12.so > 00325f817000 4 r 00017000 008:1 libpthread-2.12.so > 00325f818000 4 rw--- 00018000 008:1 libpthread-2.12.so > 00325f819000 16 rw--- 000:0 [ anon ] > 00325fa0 8 r-x-- 008:1 libdl-2.12.so > 00325fa020002048 - 2000 008:1 libdl-2.12.so > 00325fc02000 4 r 2000 008:1 libdl-2.12.so > 00325fc03000 4 rw--- 3000 008:1 libdl-2.12.so > 00325fe0 28 r-x-- 008:1 librt-2.12.so > 00325fe070002044 - 7000 008:1 librt-2.12.so > 003260006000 4 r 6000 008:1 librt-2.12.so > 003260007000 4 rw--- 7000 008:1 librt-2.12.so > 00326020 524 r-x-- 008:1 libm-2.12.so > 0032602830002044 - 00083000 008:1 libm-2.12.so > 003260482000 4 r 00082000 008:1 libm-2.12.so > 003260483000 4 rw--- 00083000 008:1 libm-2.12.so > 00326120 88 r-x-- 008:1 libresolv-2.12.so > 0032612160002048 - 00016000 008:1 libresolv-2.12.so > 003261416000 4 r 00016000 008:1 libresolv-2.12.so > 003261417000 4 rw--- 00017000
[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812128#comment-15812128 ] Eric Payne commented on YARN-5889: -- bq. Since number of active users are changed, we need to recalculate all active users limits, correct? Correct. Comparing each user's "cached recompute count" value against the the queue's value should trigger that to happen when {{getComputedActiveUserLimit}} or {{getComputedUserLimit}} is called for each specific user. bq. I need to have 2 cacheLimit in user data structure (one for active user and another for all users). Are you asking if we need two versions of {{cachedRecalcULCount}} in the {{User}} class? If that's the question, then, no, I don't think so. The queue value will change for all of the conditions outlined in [~jlowe]'s [comment (above)|https://issues.apache.org/jira/browse/YARN-5889?focusedCommentId=15745552=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15745552], and that will trigger the recalculation. However, I do think {{reComputeUserLimits}} needs to be modified to update {{preComputedUserLimit.get}} when recomputing user limits. That is not done anywhare in the current patch. > Improve user-limit calculation in capacity scheduler > > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5889.0001.patch, > YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, > YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with > a write lock. To improve performance, this tickets is focussing on moving > user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811877#comment-15811877 ] Sunil G commented on YARN-5995: --- Few suggestions here ||Op Name||Metric|| |#total no: of write operations| MutableCounter| |#total no: of read operations| MutableCounter| |no: of write ops per sec for past 1,5,15 mins.| Histogram| |no: of read ops per sec for past 1,5,15 mins.| Histogram| |amount of data written per sec/minute|Histogram| |amount of data read per sec/minute|Histogram| [~jianhe] and [~templedf], could you also pls take a look on the proposed metric choices. > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6064) Support fromId for flowRuns and flow/flowRun apps REST API's
[ https://issues.apache.org/jira/browse/YARN-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811871#comment-15811871 ] Hadoop QA commented on YARN-6064: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 40s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 30s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 39s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s{color} | {color:green} YARN-5355 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} YARN-5355 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 19 new + 23 unchanged - 6 fixed = 42 total (was 29) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 53s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 0s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-tests in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | YARN-6064 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846324/YARN-6064-YARN-5355.0002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs
[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811817#comment-15811817 ] Sunil G commented on YARN-5889: --- Thanks [~eepayne] for the detailed explanation. I was also having similar thought. Jus one point to clarify here, bq. if number of active users has increased or decreased, all active users in preComputedActiveUserLimit are invalided, and not just the one that was activated/deactivated. This requires recalculation for other users when it is not necessary. Since number of active users are changed, we need to recalculate all active users limits, correct?. Because we divide total-resource-used-byactive-user with active-user count. In the proposed changed patch also, cached limit will be different with actual user count when we query user-limit for that user. In my patch, i cleared all map because of that. could you please help to elaborate a little more. I also feel cachedLimit make code more simpler, hence no issue in making change. However I need to have 2 cacheLimit in user data structure (one for active user and another for all users). Is my thinking in line with yours. pls help to clarify. Thank You. > Improve user-limit calculation in capacity scheduler > > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5889.0001.patch, > YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, > YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with > a write lock. To improve performance, this tickets is focussing on moving > user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org