[jira] [Commented] (YARN-5350) Ensure LocalScheduler does not lose the sort order of allocatable nodes returned by the RM
[ https://issues.apache.org/jira/browse/YARN-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385407#comment-15385407 ] Hadoop QA commented on YARN-5350: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 22 unchanged - 1 fixed = 22 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 19s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 28m 4s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.TestDirectoryCollection | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818962/YARN-5350.003.patch | | JIRA Issue | YARN-5350 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 7040699c7987 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8f0d3d6 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12377/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/12377/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12377/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12377/console | | Powered by |
[jira] [Commented] (YARN-5340) Race condition in RollingLevelDBTimelineStore#getAndSetStartTime()
[ https://issues.apache.org/jira/browse/YARN-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385396#comment-15385396 ] Hadoop QA commented on YARN-5340: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 11s {color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 15m 56s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818915/YARN-5340-trunk.002.patch | | JIRA Issue | YARN-5340 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux f2206a99ca13 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8f0d3d6 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12379/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12379/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Race condition in RollingLevelDBTimelineStore#getAndSetStartTime() > -- > > Key: YARN-5340 > URL: https://issues.apache.org/jira/browse/YARN-5340 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter:
[jira] [Commented] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.
[ https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385382#comment-15385382 ] Naganarasimha G R commented on YARN-4464: - +1 for Option 3 > default value of yarn.resourcemanager.state-store.max-completed-applications > should lower. > -- > > Key: YARN-4464 > URL: https://issues.apache.org/jira/browse/YARN-4464 > Project: Hadoop YARN > Issue Type: Wish > Components: resourcemanager >Reporter: KWON BYUNGCHANG >Assignee: Daniel Templeton >Priority: Blocker > Attachments: YARN-4464.001.patch, YARN-4464.002.patch, > YARN-4464.003.patch, YARN-4464.004.patch > > > my cluster has 120 nodes. > I configured RM Restart feature. > {code} > yarn.resourcemanager.recovery.enabled=true > yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore > yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore > {code} > unfortunately I did not configure > {{yarn.resourcemanager.state-store.max-completed-applications}}. > so that property configured default value 10,000. > I have restarted RM due to changing another configuartion. > I expected that RM restart immediately. > recovery process was very slow. I have waited about 20min. > realize missing > {{yarn.resourcemanager.state-store.max-completed-applications}}. > its default value is very huge. > need to change lower value or document notice on [RM Restart > page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5403) yarn top command does not execute correct
[ https://issues.apache.org/jira/browse/YARN-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385347#comment-15385347 ] Bibin A Chundatt commented on YARN-5403: Duplicate of YARN-4232. > yarn top command does not execute correct > - > > Key: YARN-5403 > URL: https://issues.apache.org/jira/browse/YARN-5403 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.2 >Reporter: gu-chi > Attachments: YARN-5403.patch > > > when execute {{yarn top}}, I always get exception as below: > {quote} > 16/07/19 19:55:12 ERROR cli.TopCLI: Could not fetch RM start time > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient.(HttpClient.java:211) > at sun.net.www.http.HttpClient.New(HttpClient.java:308) > at sun.net.www.http.HttpClient.New(HttpClient.java:326) > at > sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) > at > sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) > at > org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:747) > at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:443) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:421) > YARN top - 19:55:13, up 17001d, 11:55, 0 active users, queue(s): root > {quote} > As I looked into it, the function {{getRMStartTime}} use HTTP as hardcoding > no matter what is the {{yarn.http.policy}} setting, should consider if use > HTTPS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.
[ https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385320#comment-15385320 ] Jian He commented on YARN-4464: --- I vote for 3) which can solve the slowness problem and preserves the behavior to some extend > default value of yarn.resourcemanager.state-store.max-completed-applications > should lower. > -- > > Key: YARN-4464 > URL: https://issues.apache.org/jira/browse/YARN-4464 > Project: Hadoop YARN > Issue Type: Wish > Components: resourcemanager >Reporter: KWON BYUNGCHANG >Assignee: Daniel Templeton >Priority: Blocker > Attachments: YARN-4464.001.patch, YARN-4464.002.patch, > YARN-4464.003.patch, YARN-4464.004.patch > > > my cluster has 120 nodes. > I configured RM Restart feature. > {code} > yarn.resourcemanager.recovery.enabled=true > yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore > yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore > {code} > unfortunately I did not configure > {{yarn.resourcemanager.state-store.max-completed-applications}}. > so that property configured default value 10,000. > I have restarted RM due to changing another configuartion. > I expected that RM restart immediately. > recovery process was very slow. I have waited about 20min. > realize missing > {{yarn.resourcemanager.state-store.max-completed-applications}}. > its default value is very huge. > need to change lower value or document notice on [RM Restart > page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.
[ https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385319#comment-15385319 ] Jian He commented on YARN-4464: --- I vote for 3) which can solve the slowness problem and preserves the behavior to some extend > default value of yarn.resourcemanager.state-store.max-completed-applications > should lower. > -- > > Key: YARN-4464 > URL: https://issues.apache.org/jira/browse/YARN-4464 > Project: Hadoop YARN > Issue Type: Wish > Components: resourcemanager >Reporter: KWON BYUNGCHANG >Assignee: Daniel Templeton >Priority: Blocker > Attachments: YARN-4464.001.patch, YARN-4464.002.patch, > YARN-4464.003.patch, YARN-4464.004.patch > > > my cluster has 120 nodes. > I configured RM Restart feature. > {code} > yarn.resourcemanager.recovery.enabled=true > yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore > yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore > {code} > unfortunately I did not configure > {{yarn.resourcemanager.state-store.max-completed-applications}}. > so that property configured default value 10,000. > I have restarted RM due to changing another configuartion. > I expected that RM restart immediately. > recovery process was very slow. I have waited about 20min. > realize missing > {{yarn.resourcemanager.state-store.max-completed-applications}}. > its default value is very huge. > need to change lower value or document notice on [RM Restart > page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4997) Update fair scheduler to use pluggable auth provider
[ https://issues.apache.org/jira/browse/YARN-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385256#comment-15385256 ] Tao Jie commented on YARN-4997: --- Fix for review. > Update fair scheduler to use pluggable auth provider > > > Key: YARN-4997 > URL: https://issues.apache.org/jira/browse/YARN-4997 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Tao Jie > Attachments: YARN-4997-001.patch > > > Now that YARN-3100 has made the authorization pluggable, it should be > supported by the fair scheduler. YARN-3100 only updated the capacity > scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4997) Update fair scheduler to use pluggable auth provider
[ https://issues.apache.org/jira/browse/YARN-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Jie updated YARN-4997: -- Attachment: YARN-4997-001.patch > Update fair scheduler to use pluggable auth provider > > > Key: YARN-4997 > URL: https://issues.apache.org/jira/browse/YARN-4997 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Tao Jie > Attachments: YARN-4997-001.patch > > > Now that YARN-3100 has made the authorization pluggable, it should be > supported by the fair scheduler. YARN-3100 only updated the capacity > scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5390) Federation Subcluster Resolver
[ https://issues.apache.org/jira/browse/YARN-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-5390: - Assignee: Ellen Hui > Federation Subcluster Resolver > -- > > Key: YARN-5390 > URL: https://issues.apache.org/jira/browse/YARN-5390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Ellen Hui > > This JIRA tracks effort to create a mechanism to resolve nodes/racks resource > names to sub-cluster identifiers. This is needed by the federation policies > in YARN-5323, YARN-5324, YARN-5325 to operate correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.
[ https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385192#comment-15385192 ] Karthik Kambatla commented on YARN-4464: I know there is no right answer here. We should have picked a better default to begin with. IAC, my preference would be whatever least astonishes the admins/users. Options sorted by least astonishment: # Don't change anything. Keep it at 10,000 and deal with recovery slowness etc. # Change it to 0. When people try out Hadoop 3 and failover, they immediately realize they don't see any completed applications. However, they all will likely have to change it # Change it to 1000. People will realize it late, but most users might not necessarily run into any issues ever. By the way, one other change we should make is to limit {{rm.store.max-completed-apps}} to {{rm.max-completed-apps}}. > default value of yarn.resourcemanager.state-store.max-completed-applications > should lower. > -- > > Key: YARN-4464 > URL: https://issues.apache.org/jira/browse/YARN-4464 > Project: Hadoop YARN > Issue Type: Wish > Components: resourcemanager >Reporter: KWON BYUNGCHANG >Assignee: Daniel Templeton >Priority: Blocker > Attachments: YARN-4464.001.patch, YARN-4464.002.patch, > YARN-4464.003.patch, YARN-4464.004.patch > > > my cluster has 120 nodes. > I configured RM Restart feature. > {code} > yarn.resourcemanager.recovery.enabled=true > yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore > yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore > {code} > unfortunately I did not configure > {{yarn.resourcemanager.state-store.max-completed-applications}}. > so that property configured default value 10,000. > I have restarted RM due to changing another configuartion. > I expected that RM restart immediately. > recovery process was very slow. I have waited about 20min. > realize missing > {{yarn.resourcemanager.state-store.max-completed-applications}}. > its default value is very huge. > need to change lower value or document notice on [RM Restart > page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3664) Federation PolicyStore APIs
[ https://issues.apache.org/jira/browse/YARN-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3664: - Attachment: YARN-3664-YARN-2915-v1.patch Updated patch (v1) that incorporates [~leftnoteasy]'s [feedback|https://issues.apache.org/jira/browse/YARN-3662?focusedCommentId=15375947]: * Included {{FederationPolicyStore}} API class. I had missed it in previous patch, good catch. * Renamed record classes and updated Javadoc (with help from [~curino]) to make it more understandable. For more context, kindly refer to [~curino]'s [summary|https://issues.apache.org/jira/browse/YARN-5323?focusedCommentId=15380907] and associated policy patches in YARN-5323. > Federation PolicyStore APIs > --- > > Key: YARN-3664 > URL: https://issues.apache.org/jira/browse/YARN-3664 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3664-YARN-2915-v0.patch, > YARN-3664-YARN-2915-v1.patch > > > The federation Policy Store contains information about the capacity > allocations made by users, their mapping to sub-clusters and the policies > that each of the components (Router, AMRMPRoxy, RMs) should enforce -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5203) Return ResourceRequest JAXB object in ResourceManager Cluster Applications REST API
[ https://issues.apache.org/jira/browse/YARN-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385137#comment-15385137 ] Ellen Hui commented on YARN-5203: - Hi [~sunilg], thanks for the comment. I tested this more thoroughly and [~subru] is right; keeping the old resourceRequests element causes an UnmarshalException in the Federation Router, which was the original symptom of this bug. > Return ResourceRequest JAXB object in ResourceManager Cluster Applications > REST API > --- > > Key: YARN-5203 > URL: https://issues.apache.org/jira/browse/YARN-5203 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Subru Krishnan >Assignee: Ellen Hui > Attachments: YARN-5203.v0.patch, YARN-5203.v1.patch > > > The ResourceManager Cluster Applications REST API returns {{ResourceRequest}} > as String rather than a JAXB object. This prevents downstream tools like > Federation Router (YARN-3659) that depend on the REST API to unmarshall the > {{AppInfo}}. This JIRA proposes updating {{AppInfo}} to return a JAXB version > of the {{ResourceRequest}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5340) Race condition in RollingLevelDBTimelineStore#getAndSetStartTime()
[ https://issues.apache.org/jira/browse/YARN-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385116#comment-15385116 ] Hadoop QA commented on YARN-5340: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 9s {color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 15m 8s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818915/YARN-5340-trunk.002.patch | | JIRA Issue | YARN-5340 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux f13f570de975 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / dc065dd | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12376/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12376/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Race condition in RollingLevelDBTimelineStore#getAndSetStartTime() > -- > > Key: YARN-5340 > URL: https://issues.apache.org/jira/browse/YARN-5340 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter:
[jira] [Commented] (YARN-5203) Return ResourceRequest JAXB object in ResourceManager Cluster Applications REST API
[ https://issues.apache.org/jira/browse/YARN-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385099#comment-15385099 ] Subru Krishnan commented on YARN-5203: -- [~sunilg], thanks for taking a look and bringing up the important point on compatibility. IIUC unfortunately we cannot continue to have the raw RRs as JAXB will not be able to unmarshal directly to {{AppInfo}} object. Today it works because existing clients like UI deserialize directly to _String_ which should continue to work even if we use JAXB object for RRs. [~ellenfkh], can you kindly do a quick validation as you have a setup where you already have been testing extensively. Thanks! > Return ResourceRequest JAXB object in ResourceManager Cluster Applications > REST API > --- > > Key: YARN-5203 > URL: https://issues.apache.org/jira/browse/YARN-5203 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Subru Krishnan >Assignee: Ellen Hui > Attachments: YARN-5203.v0.patch, YARN-5203.v1.patch > > > The ResourceManager Cluster Applications REST API returns {{ResourceRequest}} > as String rather than a JAXB object. This prevents downstream tools like > Federation Router (YARN-3659) that depend on the REST API to unmarshall the > {{AppInfo}}. This JIRA proposes updating {{AppInfo}} to return a JAXB version > of the {{ResourceRequest}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5391) FederationPolicy implementations (tieing together RouterFederationPolicy and AMRMProxyFederationPolicy)
[ https://issues.apache.org/jira/browse/YARN-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-5391: --- Attachment: YARN-5391.02.patch > FederationPolicy implementations (tieing together RouterFederationPolicy and > AMRMProxyFederationPolicy) > --- > > Key: YARN-5391 > URL: https://issues.apache.org/jira/browse/YARN-5391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5391.01.patch, YARN-5391.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5324) Stateless router policies implementation
[ https://issues.apache.org/jira/browse/YARN-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-5324: --- Attachment: YARN-5324.02.patch > Stateless router policies implementation > > > Key: YARN-5324 > URL: https://issues.apache.org/jira/browse/YARN-5324 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5324.01.patch, YARN-5324.02.patch > > > These are policies at the Router that do not require maintaing state across > choices (e.g., weighted random). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5323) Policies APIs (for Router and AMRMProxy policies)
[ https://issues.apache.org/jira/browse/YARN-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-5323: --- Attachment: YARN-5323.03.patch > Policies APIs (for Router and AMRMProxy policies) > - > > Key: YARN-5323 > URL: https://issues.apache.org/jira/browse/YARN-5323 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5323.01.patch, YARN-5323.02.patch, > YARN-5323.03.patch > > > This JIRA tracks APIs for the policies that will guide the Router and > AMRMProxy decisions on where to fwd the jobs submission/query requests as > well as ResourceRequests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5325) Stateless ARMRMProxy policies implementation
[ https://issues.apache.org/jira/browse/YARN-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-5325: --- Attachment: YARN-5325.02.patch > Stateless ARMRMProxy policies implementation > > > Key: YARN-5325 > URL: https://issues.apache.org/jira/browse/YARN-5325 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5325.01.patch, YARN-5325.02.patch > > > This JIRA tracks policies in the AMRMProxy that decide how to forward > ResourceRequests, without maintaining substantial state across decissions > (e.g., broadcast). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5350) Ensure LocalScheduler does not lose the sort order of allocatable nodes returned by the RM
[ https://issues.apache.org/jira/browse/YARN-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-5350: -- Attachment: YARN-5350.003.patch Thanks for the review [~subru]. Updating testcase with your suggestion.. > Ensure LocalScheduler does not lose the sort order of allocatable nodes > returned by the RM > -- > > Key: YARN-5350 > URL: https://issues.apache.org/jira/browse/YARN-5350 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Fix For: 2.9.0 > > Attachments: YARN-5350.001.patch, YARN-5350.002.patch, > YARN-5350.003.patch > > > The LocalScheduler receives an ordered list of nodes from the RM with each > allocate call. This list, which is used by the LocalScheduler to allocate > OPPORTUNISTIC containers, is sorted on the Nodes free capacity (queue length > / wait time). > Unfortunately, the LocalScheduler stores this list in a HashMap thereby > losing the sort order. > The trivial fix would be to replace the HashMap with a LinkedHashMap which > retains the insertion order. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5392) Replace use of Priority in the Scheduling infrastructure with an opaque ShedulerKey
[ https://issues.apache.org/jira/browse/YARN-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385055#comment-15385055 ] Subru Krishnan commented on YARN-5392: -- Thanks [~asuresh] for working on this. I just have one question, should we have the {{SchedulerKey}} in addition to {{Priority}}? I feel {{Priority}} should be accessible directly as before outside of the scheduler layers and the notion of {{SchedulerKey}} should be confined to the scheduler (ideally should be transparent to other RM entities/services). An extreme example would be that in future we could decide not to use {{Priority}} as a {{SchedulerKey}} at some point in the future. Overall the patch LGTM. Since I have been working very closely with you; [~kasha]/[~leftnoteasy], can you guys take a look. > Replace use of Priority in the Scheduling infrastructure with an opaque > ShedulerKey > --- > > Key: YARN-5392 > URL: https://issues.apache.org/jira/browse/YARN-5392 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5392.001.patch, YARN-5392.002.patch, > YARN-5392.003.patch > > > Based on discussions in YARN-4888, this jira proposes to replace the use of > {{Priority}} in the Scheduler infrastructure (Scheduler, Queues, SchedulerApp > / Node etc.) with a more opaque and extensible {{SchedulerKey}}. > Note: Even though {{SchedulerKey}} will be used by the internal scheduling > infrastructure, It will not be exposed to the Client or the AM. The > SchdulerKey is meant to be an internal construct that is derived from > attributes of the ResourceRequest / ApplicationSubmissionContext / Scheduler > Configuration etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5350) Ensure LocalScheduler does not lose the sort order of allocatable nodes returned by the RM
[ https://issues.apache.org/jira/browse/YARN-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385047#comment-15385047 ] Subru Krishnan commented on YARN-5350: -- [~asuresh], +1 for the fix. A minor feedback for the test: * it would be good to add a assertion on the total number of OPPORTUNISTIC containers. * add another check to ensure sort order is maintained by doing a second round of allocation and/or request for multiple OPPORTUNISTIC containers. Thanks. > Ensure LocalScheduler does not lose the sort order of allocatable nodes > returned by the RM > -- > > Key: YARN-5350 > URL: https://issues.apache.org/jira/browse/YARN-5350 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Fix For: 2.9.0 > > Attachments: YARN-5350.001.patch, YARN-5350.002.patch > > > The LocalScheduler receives an ordered list of nodes from the RM with each > allocate call. This list, which is used by the LocalScheduler to allocate > OPPORTUNISTIC containers, is sorted on the Nodes free capacity (queue length > / wait time). > Unfortunately, the LocalScheduler stores this list in a HashMap thereby > losing the sort order. > The trivial fix would be to replace the HashMap with a LinkedHashMap which > retains the insertion order. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3477) TimelineClientImpl swallows exceptions
[ https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3477: Attachment: YARN-3477-trunk.003.patch [~ste...@apache.org] I rebased your patch to the latest trunk. Here's the rebased version. > TimelineClientImpl swallows exceptions > -- > > Key: YARN-3477 > URL: https://issues.apache.org/jira/browse/YARN-3477 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0, 2.7.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-3477-001.patch, YARN-3477-002.patch, > YARN-3477-trunk.003.patch > > > If timeline client fails more than the retry count, the original exception is > not thrown. Instead some runtime exception is raised saying "retries run out" > # the failing exception should be rethrown, ideally via > NetUtils.wrapException to include URL of the failing endpoing > # Otherwise, the raised RTE should (a) state that URL and (b) set the > original fault as the inner cause -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5394) Correct the wrong file name when mounting /etc/passwd to Docker Container
[ https://issues.apache.org/jira/browse/YARN-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384999#comment-15384999 ] Sidharta Seethana commented on YARN-5394: - /cc [~zyluo], [~vvasudev] As discussed in YARN-5360, I don't think its a good idea to mount /etc/passwd into a container without any way to disable it. At a minimum, we should add a (cluster-wide?) mechanism to control this (and it should be disabled by default, IMO). > Correct the wrong file name when mounting /etc/passwd to Docker Container > - > > Key: YARN-5394 > URL: https://issues.apache.org/jira/browse/YARN-5394 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang > Attachments: YARN-5394-branch-2.8.001.patch > > > Current LCE (DockerLinuxContainerRuntime) is mounting /etc/passwd to the > container. But it seems to use wrong file name "/etc/password" for container. > {panel} > .addMountLocation("/etc/passwd", "/etc/password:ro"); > {panel} > This causes LCE failed to launch the Docker container if the Docker images > don't create the same user name and UID in it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3662) Federation Membership State APIs
[ https://issues.apache.org/jira/browse/YARN-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-3662: - Attachment: YARN-3662-YARN-2915-v3.01.patch Reattaching patch after rebasing _YARN-2915_ branch to pull in HADOOP-13342 > Federation Membership State APIs > > > Key: YARN-3662 > URL: https://issues.apache.org/jira/browse/YARN-3662 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3662-YARN-2915-v1.1.patch, > YARN-3662-YARN-2915-v1.patch, YARN-3662-YARN-2915-v2.patch, > YARN-3662-YARN-2915-v3.01.patch, YARN-3662-YARN-2915-v3.patch > > > The Federation Application State encapsulates the information about the > active RM of each sub-cluster that is participating in Federation. The > information includes addresses for ClientRM, ApplicationMaster and Admin > services along with the sub_cluster _capability_ which is currently defined > by *ClusterMetricsInfo*. Please refer to the design doc in parent JIRA for > further details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5340) Race condition in RollingLevelDBTimelineStore#getAndSetStartTime()
[ https://issues.apache.org/jira/browse/YARN-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384964#comment-15384964 ] Vinod Kumar Vavilapalli commented on YARN-5340: --- Tx for the update, [~gtCarrera9]. This definitely looks slightly better than the previous one. I'll check this in if Jenkins says okay. > Race condition in RollingLevelDBTimelineStore#getAndSetStartTime() > -- > > Key: YARN-5340 > URL: https://issues.apache.org/jira/browse/YARN-5340 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Li Lu >Priority: Critical > Attachments: YARN-5340-trunk.001.patch, YARN-5340-trunk.002.patch > > > App Name/User/RPC Port/AM Host info is missing from ATS web service or YARN > CLI's app info > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn --config > /tmp/hadoopConf application -status application_1467931619679_0001 > Application Report : > Application-Id : application_1467931619679_0001 > Application-Name : null > Application-Type : null > User : null > Queue : null > Application Priority : null > Start-Time : 0 > Finish-Time : 1467931672057 > Progress : 100% > State : FINISHED > Final-State : SUCCEEDED > Tracking-URL : N/A > RPC Port : -1 > AM Host : N/A > Aggregate Resource Allocation : 290014 MB-seconds, 74 vcore-seconds > Log Aggregation Status : N/A > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5092) TestRMDelegationTokens fails intermittently
[ https://issues.apache.org/jira/browse/YARN-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5092: - Attachment: YARN-5092.002.patch Thanks for the review, Rohith! Good catch on the rm1.stop() suggestion. Clearing the queue metrics wasn't causing the default metrics exception, that was the failure to stop. Clearing the queue metrics fixed the class cast exception, but I went ahead with your suggestion to remove the scheduler setting as another way to fix it since it seemed unrelated to the test. Also added the setLoginUser change in the setup method. Tested both orderings of the tests. > TestRMDelegationTokens fails intermittently > > > Key: YARN-5092 > URL: https://issues.apache.org/jira/browse/YARN-5092 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.7.2 >Reporter: Rohith Sharma K S >Assignee: Jason Lowe > Attachments: YARN-5092.001.patch, YARN-5092.002.patch > > > In build > [link|https://builds.apache.org/job/PreCommit-YARN-Build/11476/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_101.txt] > , TestRMDelegationTokens fails for 2 test cases > # TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey > # TestRMDelegationTokens.testRemoveExpiredMasterKeyInRMStateStore -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5203) Return ResourceRequest JAXB object in ResourceManager Cluster Applications REST API
[ https://issues.apache.org/jira/browse/YARN-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ellen Hui updated YARN-5203: Attachment: YARN-5203.v1.patch Add ExecutionType, raw ResourceRequests for backwards compatibility. > Return ResourceRequest JAXB object in ResourceManager Cluster Applications > REST API > --- > > Key: YARN-5203 > URL: https://issues.apache.org/jira/browse/YARN-5203 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Subru Krishnan >Assignee: Ellen Hui > Attachments: YARN-5203.v0.patch, YARN-5203.v1.patch > > > The ResourceManager Cluster Applications REST API returns {{ResourceRequest}} > as String rather than a JAXB object. This prevents downstream tools like > Federation Router (YARN-3659) that depend on the REST API to unmarshall the > {{AppInfo}}. This JIRA proposes updating {{AppInfo}} to return a JAXB version > of the {{ResourceRequest}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service
[ https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384882#comment-15384882 ] Hadoop QA commented on YARN-679: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 22 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 35s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 8m 35s {color} | {color:red} root generated 8 new + 709 unchanged - 0 fixed = 717 total (was 709) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s {color} | {color:red} hadoop-common-project/hadoop-common: The patch generated 42 new + 119 unchanged - 34 fixed = 161 total (was 153) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 73 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 54s {color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 5s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.http.TestHttpServerLifecycle | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818866/YARN-679-010.patch | | JIRA Issue | YARN-679 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 78053710b77c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cda0a28 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/12374/artifact/patchprocess/diff-compile-javac-root.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12374/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/12374/artifact/patchprocess/whitespace-eol.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12374/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/12374/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt | | Test Results |
[jira] [Commented] (YARN-5352) Allow container-executor to use private /tmp
[ https://issues.apache.org/jira/browse/YARN-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384856#comment-15384856 ] Hadoop QA commented on YARN-5352: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 8s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m 52s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 36m 27s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.TestDirectoryCollection | | | hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818857/YARN-5352-v0.patch | | JIRA Issue | YARN-5352 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 4ec1f4873ecf 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cda0a28 | | Default Java | 1.8.0_91 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12375/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/12375/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12375/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12375/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Allow container-executor to use private /tmp > - > > Key: YARN-5352 > URL: https://issues.apache.org/jira/browse/YARN-5352 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-5352-v0.patch > > > It's very common for user code to create things in /tmp. Yes, applications > have means to specify alternate tmp directories but doing so is opt-in and > therefore doesn't happen in many case. At a minimum, linux can use private > namespaces
[jira] [Updated] (YARN-5137) Make DiskChecker pluggable in NodeManager
[ https://issues.apache.org/jira/browse/YARN-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-5137: --- Attachment: YARN-5137.003.patch > Make DiskChecker pluggable in NodeManager > - > > Key: YARN-5137 > URL: https://issues.apache.org/jira/browse/YARN-5137 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Ray Chiang >Assignee: Yufei Gu > Labels: supportability > Attachments: YARN-5137.001.patch, YARN-5137.002.patch, > YARN-5137.003.patch > > > It would be nice to have the option for a DiskChecker that has more > sophisticated checking capabilities. In order to do this, we would first > need DiskChecker to be pluggable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5181) ClusterNodeTracker: add method to get list of nodes matching a specific resourceName
[ https://issues.apache.org/jira/browse/YARN-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384804#comment-15384804 ] Hudson commented on YARN-5181: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10119 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10119/]) YARN-5181. ClusterNodeTracker: add method to get list of nodes matching (arun suresh: rev cda0a280ddd0c77af93d236fc80478c16bbe809a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ClusterNodeTracker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestClusterNodeTracker.java > ClusterNodeTracker: add method to get list of nodes matching a specific > resourceName > > > Key: YARN-5181 > URL: https://issues.apache.org/jira/browse/YARN-5181 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Fix For: 2.9.0 > > Attachments: yarn-5181-1.patch, yarn-5181-2.patch, yarn-5181-3.patch > > > ClusterNodeTracker should have a method to return the list of nodes matching > a particular resourceName. This is so we could identify what all nodes a > particular ResourceRequest is interested in, which in turn is useful in > YARN-5139 (global scheduler) and YARN-4752 (FairScheduler preemption > overhaul). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5340) Race condition in RollingLevelDBTimelineStore#getAndSetStartTime()
[ https://issues.apache.org/jira/browse/YARN-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-5340: Attachment: YARN-5340-trunk.002.patch Thanks for the review [~djp]! I shrink the size of the critical section. Now we only synchronize globally when there is a cache miss and we have to check and update the timestamp in leveldb. > Race condition in RollingLevelDBTimelineStore#getAndSetStartTime() > -- > > Key: YARN-5340 > URL: https://issues.apache.org/jira/browse/YARN-5340 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Li Lu >Priority: Critical > Attachments: YARN-5340-trunk.001.patch, YARN-5340-trunk.002.patch > > > App Name/User/RPC Port/AM Host info is missing from ATS web service or YARN > CLI's app info > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn --config > /tmp/hadoopConf application -status application_1467931619679_0001 > Application Report : > Application-Id : application_1467931619679_0001 > Application-Name : null > Application-Type : null > User : null > Queue : null > Application Priority : null > Start-Time : 0 > Finish-Time : 1467931672057 > Progress : 100% > State : FINISHED > Final-State : SUCCEEDED > Tracking-URL : N/A > RPC Port : -1 > AM Host : N/A > Aggregate Resource Allocation : 290014 MB-seconds, 74 vcore-seconds > Log Aggregation Status : N/A > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5360) Decouple host user and Docker container user
[ https://issues.apache.org/jira/browse/YARN-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384786#comment-15384786 ] Sidharta Seethana commented on YARN-5360: - [~templedf], running a container as root does in fact have security implications (there are other things to consider in conjunction with this - capabilites, selinux and so on). There are (at least) a couple of reasons why --user is enforced currently : 1) YARN security model requires the launched process run as the designated user 2) Log aggregation/local permissions etc - some of these things would stop working if the generated logs have ownership that is different from what YARN expects. These are also the reasons that need to be considered for YARN-4266 > Decouple host user and Docker container user > > > Key: YARN-5360 > URL: https://issues.apache.org/jira/browse/YARN-5360 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang > > There is *a dependency between job submitting user and the user in the Docker > image* in LCE currently. For instance, in order to run the Docker container > as yarn user, we can choose set the > "yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user" to yarn > and leave > "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users" > default (true). Then LCE will choose yarn ( UID maybe 1001) as the user > running jobs. > LCE will mount the generated launch_container.sh (owned by the running job > user) and /etc/passwd (*current the code is mounting to container's > /etc/password, I think it's a mistake*) into the Docker container and > utilizes "docker run --user=" option to get it done internally. > Mounting /etc/passwd to the container is a not good choice due to override > original users defined in Docker image. As far as I know, since Docker v1.8 > (or maybe earlier), the Docker run command "--user=" option accepts UID and > *when passing UID, the user does not have to exist in the container*. So we > could use UID instead of user name to construct the Docker run command to > eliminate the dependency that create the same user in the Docker image. This > enables LCE the ability to launch any Docker container safely regardless what > users in it. > But this is not enough to decouple host user and Docker container user. The > final solution we are searching for are focused on allowing users to run > their Docker images flexibly without involving dependencies of YARN and make > sure the container won't bring in security risk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3662) Federation Membership State APIs
[ https://issues.apache.org/jira/browse/YARN-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384783#comment-15384783 ] Hadoop QA commented on YARN-3662: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 2s {color} | {color:red} Docker failed to build yetus/hadoop:e2f6409. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818722/YARN-3662-YARN-2915-v3.patch | | JIRA Issue | YARN-3662 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12373/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Federation Membership State APIs > > > Key: YARN-3662 > URL: https://issues.apache.org/jira/browse/YARN-3662 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Attachments: YARN-3662-YARN-2915-v1.1.patch, > YARN-3662-YARN-2915-v1.patch, YARN-3662-YARN-2915-v2.patch, > YARN-3662-YARN-2915-v3.patch > > > The Federation Application State encapsulates the information about the > active RM of each sub-cluster that is participating in Federation. The > information includes addresses for ClientRM, ApplicationMaster and Admin > services along with the sub_cluster _capability_ which is currently defined > by *ClusterMetricsInfo*. Please refer to the design doc in parent JIRA for > further details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384758#comment-15384758 ] Vrushali C commented on YARN-5382: -- I see, thanks, sounds good. Will do that. > RM does not audit log kill request for active applications > -- > > Key: YARN-5382 > URL: https://issues.apache.org/jira/browse/YARN-5382 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Assignee: Vrushali C > Attachments: YARN-5382-branch-2.7.01.patch, > YARN-5382-branch-2.7.02.patch > > > ClientRMService will audit a kill request but only if it either fails to > issue the kill or if the kill is sent to an already finished application. It > does not create a log entry when the application is active which is arguably > the most important case to audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5404) Add the ability to split reverse zone subnets
Shane Kumpf created YARN-5404: - Summary: Add the ability to split reverse zone subnets Key: YARN-5404 URL: https://issues.apache.org/jira/browse/YARN-5404 Project: Hadoop YARN Issue Type: Sub-task Reporter: Shane Kumpf Assignee: Shane Kumpf In some environments, the entire container subnet may not be used exclusively by containers (ie the YARN nodemanager host IPs may also be part of the larger subnet). As a result, the reverse lookup zones created by the YARN Registry DNS server may not match those created on the forwarders. For example: Network: 172.27.0.0 Subnet: 255.255.248.0 Hosts: 0.27.172.in-addr.arpa 1.27.172.in-addr.arpa 2.27.172.in-addr.arpa 3.27.172.in-addr.arpa Containers 4.27.172.in-addr.arpa 5.27.172.in-addr.arpa 6.27.172.in-addr.arpa 7.27.172.in-addr.arpa YARN Registry DNS only allows for creating (as the total IP count is greater than 256): 27.172.in-addr.arpa Provide configuration to further subdivide the subnets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5360) Decouple host user and Docker container user
[ https://issues.apache.org/jira/browse/YARN-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384777#comment-15384777 ] Sidharta Seethana commented on YARN-5360: - [~zyluo], {quote} I think this is inconsistent with Docker's motto to "build, ship and run". There is no point of using Docker if the user has to use every image as a base to add the correct user. {quote} While that may be Docker's motto - the objective of YARN-3611, in my opinion has never been to use docker for docker's sake - we needed to adapt it to the YARN/hadoop world - hadoop security, log aggregation, localization - all of these need to work. > Decouple host user and Docker container user > > > Key: YARN-5360 > URL: https://issues.apache.org/jira/browse/YARN-5360 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang > > There is *a dependency between job submitting user and the user in the Docker > image* in LCE currently. For instance, in order to run the Docker container > as yarn user, we can choose set the > "yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user" to yarn > and leave > "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users" > default (true). Then LCE will choose yarn ( UID maybe 1001) as the user > running jobs. > LCE will mount the generated launch_container.sh (owned by the running job > user) and /etc/passwd (*current the code is mounting to container's > /etc/password, I think it's a mistake*) into the Docker container and > utilizes "docker run --user=" option to get it done internally. > Mounting /etc/passwd to the container is a not good choice due to override > original users defined in Docker image. As far as I know, since Docker v1.8 > (or maybe earlier), the Docker run command "--user=" option accepts UID and > *when passing UID, the user does not have to exist in the container*. So we > could use UID instead of user name to construct the Docker run command to > eliminate the dependency that create the same user in the Docker image. This > enables LCE the ability to launch any Docker container safely regardless what > users in it. > But this is not enough to decouple host user and Docker container user. The > final solution we are searching for are focused on allowing users to run > their Docker images flexibly without involving dependencies of YARN and make > sure the container won't bring in security risk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384722#comment-15384722 ] Jason Lowe commented on YARN-5382: -- bq. Will update the patch to include auditing of killing of active apps only. Actually I think we should go with Jian's suggestion. Auditing active apps could still generate duplicate events if the event dispatch is delayed, and Jian's suggestion means we'll only log it once when the app transitions from active to starting the kill processing. We will need to enhance the kill event to include the requesting user and remote IP address so it can be audit logged properly within the RMAppImpl transition. > RM does not audit log kill request for active applications > -- > > Key: YARN-5382 > URL: https://issues.apache.org/jira/browse/YARN-5382 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Assignee: Vrushali C > Attachments: YARN-5382-branch-2.7.01.patch, > YARN-5382-branch-2.7.02.patch > > > ClientRMService will audit a kill request but only if it either fails to > issue the kill or if the kill is sent to an already finished application. It > does not create a log entry when the application is active which is arguably > the most important case to audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
[ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374343#comment-15374343 ] Wangda Tan edited comment on YARN-4091 at 7/19/16 6:33 PM: --- Hi all, Given "YARN-4091.preliminary.1.patch" I uploaded above, here are some brief descriptions about newly added classes and test REST API. Newly Added Classes: ActivityManager: - A class to store node or application allocations. It mainly contains operations for allocation start, add, update and finish. NodeAllocation: - It contains allocation information for one allocation in a node heartbeat. Detailed allocation activities are first stored in "AllocationActivity" as operations, then transformed to a tree structure. Tree structure starts from root queue and ends in leaf queue, application or container allocation. AllocationActivity: - It records an activity operation in allocation, which can be classified as queue, application or container activity. Other information include state, diagnostic, priority. ActivityNode: - It represents tree node in "NodeAllocation" tree structure. Each node may represent queue, application or container in allocation activity. Node may have children node if successfully allocated to next level. ActivityDiagnosticConstant: - Collection of diagnostics. ActivityState: - Collection of activity operation states. AllocationState: - Collection of allocation final states. AllocationActivityType: - Collection of types for activity operation. AppAllocation: - It contains allocation information for one application within a period of time. Each application allocation may have several allocation attempts. ActivitiesInfo: - DAO object to display node allocation activity. NodeAllocationInfo: - DAO object to display each node allocation in node heartbeat. ActivityNodeInfo: - DAO object to display node information in allocation tree. It corresponds to "ActivityNode" class. AppActivitiesInfo: - DAO object to display application activity. AppAllocationInfo: - DAO object to display application allocation detailed information. Test REST API: - Look at next node’s activities(by default):http://localhost:18088/ws/v1/cluster/scheduler/activities - Only look at specific node: http://localhost:18088/ws/v1/cluster/scheduler/activities?nodeId=node-87:75 OR without port number http://localhost:18088/ws/v1/cluster/scheduler/activities?nodeId=node-87 - look at activities for specific application within a period of time(3s in default): http://localhost:18088/ws/v1/cluster/scheduler/app-activities?appId=application_1468198570845_0022, http://localhost:18088/ws/v1/cluster/scheduler/app-activities?appId=application_1468198570845_0022=5.2 Test class: - TestRMWebServicesCapacitySched.java org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched#testActivityJSON org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched#testAppActivityJSON Thanks for review. Please feel free to put forward any suggestions for improvements. was (Author: chenge): Hi all, Given "YARN-4091.preliminary.1.patch" I uploaded above, here are some brief descriptions about newly added classes and test REST API. Newly Added Classes: ActivityManager: A class to store node or application allocations. It mainly contains operations for allocation start, add, update and finish. NodeAllocation: It contains allocation information for one allocation in a node heartbeat. Detailed allocation activities are first stored in "AllocationActivity" as operations, then transformed to a tree structure. Tree structure starts from root queue and ends in leaf queue, application or container allocation. AllocationActivity: It records an activity operation in allocation, which can be classified as queue, application or container activity. Other information include state, diagnostic, priority. ActivityNode: It represents tree node in "NodeAllocation" tree structure. Each node may represent queue, application or container in allocation activity. Node may have children node if successfully allocated to next level. ActivityDiagnosticConstant: Collection of diagnostics. ActivityState: Collection of activity operation states. AllocationState: Collection of allocation final states. AllocationActivityType: Collection of types for activity operation. AppAllocation: It contains allocation information for one application within a period of time. Each application allocation may have several allocation attempts. ActivitiesInfo: DAO object to display node allocation activity. NodeAllocationInfo: DAO object to display each node allocation in node heartbeat. ActivityNodeInfo: DAO object to display node information in allocation tree. It corresponds to "ActivityNode" class. AppActivitiesInfo:
[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384627#comment-15384627 ] Vrushali C commented on YARN-5382: -- Thanks [~jlowe] and [~jianhe]! Will update the patch to include auditing of killing of active apps only. > RM does not audit log kill request for active applications > -- > > Key: YARN-5382 > URL: https://issues.apache.org/jira/browse/YARN-5382 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Assignee: Vrushali C > Attachments: YARN-5382-branch-2.7.01.patch, > YARN-5382-branch-2.7.02.patch > > > ClientRMService will audit a kill request but only if it either fails to > issue the kill or if the kill is sent to an already finished application. It > does not create a log entry when the application is active which is arguably > the most important case to audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5264) Use FSQueue to store queue-specific information
[ https://issues.apache.org/jira/browse/YARN-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-5264: --- Attachment: YARN-5264.001.patch > Use FSQueue to store queue-specific information > --- > > Key: YARN-5264 > URL: https://issues.apache.org/jira/browse/YARN-5264 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-5264.001.patch > > > Use FSQueue to store queue-specific information instead of querying > AllocationConfiguration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5392) Replace use of Priority in the Scheduling infrastructure with an opaque ShedulerKey
[ https://issues.apache.org/jira/browse/YARN-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384596#comment-15384596 ] Arun Suresh commented on YARN-5392: --- ping [~kasha], [~subru].. wondering if you might be able to give this a quick look. > Replace use of Priority in the Scheduling infrastructure with an opaque > ShedulerKey > --- > > Key: YARN-5392 > URL: https://issues.apache.org/jira/browse/YARN-5392 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5392.001.patch, YARN-5392.002.patch, > YARN-5392.003.patch > > > Based on discussions in YARN-4888, this jira proposes to replace the use of > {{Priority}} in the Scheduler infrastructure (Scheduler, Queues, SchedulerApp > / Node etc.) with a more opaque and extensible {{SchedulerKey}}. > Note: Even though {{SchedulerKey}} will be used by the internal scheduling > infrastructure, It will not be exposed to the Client or the AM. The > SchdulerKey is meant to be an internal construct that is derived from > attributes of the ResourceRequest / ApplicationSubmissionContext / Scheduler > Configuration etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-679) add an entry point that can start any Yarn service
[ https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-679: Attachment: YARN-679-010.patch Patch 010, address complaints from checkstyle *as far as I consider necessary.* Specifically, sometimes it is better if lines do go beyond 80 chars, and you can be a bit less rigorous in test code than in production about accessibility of fields. the complaints about IrqHandler using forbidden classes is valid; it's intended to be a single place for this. Ultimately, other uses in the Hadoop code could be replaced with this. > add an entry point that can start any Yarn service > -- > > Key: YARN-679 > URL: https://issues.apache.org/jira/browse/YARN-679 > Project: Hadoop YARN > Issue Type: New Feature > Components: api >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-679-001.patch, YARN-679-002.patch, > YARN-679-002.patch, YARN-679-003.patch, YARN-679-004.patch, > YARN-679-005.patch, YARN-679-006.patch, YARN-679-007.patch, > YARN-679-008.patch, YARN-679-009.patch, YARN-679-010.patch, > org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf > > Time Spent: 72h > Remaining Estimate: 0h > > There's no need to write separate .main classes for every Yarn service, given > that the startup mechanism should be identical: create, init, start, wait for > stopped -with an interrupt handler to trigger a clean shutdown on a control-c > interrupt. > Provide one that takes any classname, and a list of config files/options -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5401) yarn application kill does not let mapreduce jobs show up in jobhistory
[ https://issues.apache.org/jira/browse/YARN-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384529#comment-15384529 ] Jason Lowe commented on YARN-5401: -- Yes, if an application framework provides a kill command then that should be preferred over the yarn kill approach. The MapReduce framework kill will automatically fallback to the yarn kill if the application master is unresponsive or if the job fails to enter the killed state within a configurable amount of time (controlled via yarn.app.mapreduce.am.hard-kill-timeout-ms). > yarn application kill does not let mapreduce jobs show up in jobhistory > --- > > Key: YARN-5401 > URL: https://issues.apache.org/jira/browse/YARN-5401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Environment: centos 6.6 > apache hadoop 2.6.4 >Reporter: Nikhil Mulley > > Hi, > Its been found in our cluster running apache hadoop 2.6.4, that while the > mapreduce jobs that are killed with 'hadoop job -kill' command do end up have > the job and its counters to jobhistory server but when 'yarn application > -kill' is used on mapreduce application, job does not show up in jobhistory > server interface. > Is this intentional? If so, any particular reasons? > It would be better to have mapreduce application history reported on > jobhistory irrespective of whether kill is performed using yarn application > cli or hadoop job cli. > thanks, > Nikhil -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5342) Improve non-exclusive node partition resource allocation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384522#comment-15384522 ] Wangda Tan commented on YARN-5342: -- [~Naganarasimha], bq. Because in next NonExclusive mode allocation for the node of this parition might skip this application for which reset happened but might allocate to another application but still that partition might have pending resource requests. IIUC, we now do allocation twice for shareable node partition, first one is for exclusive allocation and second one is for shareable allocation. This is already implicitly confirmed the non-exclusive allocation is safe. Please let me know if I missed anything. I want to check this patch in as soon as possible for 2.8 and do more comprehensive in follow up JIRAs. Thanks, > Improve non-exclusive node partition resource allocation in Capacity Scheduler > -- > > Key: YARN-5342 > URL: https://issues.apache.org/jira/browse/YARN-5342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: YARN-5342.1.patch, YARN-5342.2.patch > > > In the previous implementation, one non-exclusive container allocation is > possible when the missed-opportunity >= #cluster-nodes. And > missed-opportunity will be reset when container allocated to any node. > This will slow down the frequency of container allocation on non-exclusive > node partition: *When a non-exclusive partition=x has idle resource, we can > only allocate one container for this app in every > X=nodemanagers.heartbeat-interval secs for the whole cluster.* > In this JIRA, I propose a fix to reset missed-opporunity only if we have >0 > pending resource for the non-exclusive partition OR we get allocation from > the default partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5401) yarn application kill does not let mapreduce jobs show up in jobhistory
[ https://issues.apache.org/jira/browse/YARN-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384516#comment-15384516 ] Nikhil Mulley commented on YARN-5401: - So, should apps always use app specific methods to kill their jobs but never use yarn kill unless really necessary. (like always use kill(TERM) unless kill -9 becomes necessary) > yarn application kill does not let mapreduce jobs show up in jobhistory > --- > > Key: YARN-5401 > URL: https://issues.apache.org/jira/browse/YARN-5401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Environment: centos 6.6 > apache hadoop 2.6.4 >Reporter: Nikhil Mulley > > Hi, > Its been found in our cluster running apache hadoop 2.6.4, that while the > mapreduce jobs that are killed with 'hadoop job -kill' command do end up have > the job and its counters to jobhistory server but when 'yarn application > -kill' is used on mapreduce application, job does not show up in jobhistory > server interface. > Is this intentional? If so, any particular reasons? > It would be better to have mapreduce application history reported on > jobhistory irrespective of whether kill is performed using yarn application > cli or hadoop job cli. > thanks, > Nikhil -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3477) TimelineClientImpl swallows exceptions
[ https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384510#comment-15384510 ] Li Lu commented on YARN-3477: - Sure. Let me find some time to finish this. Thanks [~ste...@apache.org]! > TimelineClientImpl swallows exceptions > -- > > Key: YARN-3477 > URL: https://issues.apache.org/jira/browse/YARN-3477 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0, 2.7.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-3477-001.patch, YARN-3477-002.patch > > > If timeline client fails more than the retry count, the original exception is > not thrown. Instead some runtime exception is raised saying "retries run out" > # the failing exception should be rethrown, ideally via > NetUtils.wrapException to include URL of the failing endpoing > # Otherwise, the raised RTE should (a) state that URL and (b) set the > original fault as the inner cause -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5352) Allow container-executor to use private /tmp
[ https://issues.apache.org/jira/browse/YARN-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384496#comment-15384496 ] Nathan Roberts commented on YARN-5352: -- This patch doesn't address localization. Thinking was that localization doesn't run application code so while it may create files in /tmp (e.g. hsperf*), I wouldn't expect that to be a significant problem. I can look into addressing localization as well if folks think it's important. > Allow container-executor to use private /tmp > - > > Key: YARN-5352 > URL: https://issues.apache.org/jira/browse/YARN-5352 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-5352-v0.patch > > > It's very common for user code to create things in /tmp. Yes, applications > have means to specify alternate tmp directories but doing so is opt-in and > therefore doesn't happen in many case. At a minimum, linux can use private > namespaces to create a private /tmp for each container so that it's using the > same space allocated to containers and it's automatically cleaned up as part > of container clean-up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5352) Allow container-executor to use private /tmp
[ https://issues.apache.org/jira/browse/YARN-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated YARN-5352: - Attachment: YARN-5352-v0.patch Patch that uses linux private namespace and bind mounts to achieve a private /tmp. > Allow container-executor to use private /tmp > - > > Key: YARN-5352 > URL: https://issues.apache.org/jira/browse/YARN-5352 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-5352-v0.patch > > > It's very common for user code to create things in /tmp. Yes, applications > have means to specify alternate tmp directories but doing so is opt-in and > therefore doesn't happen in many case. At a minimum, linux can use private > namespaces to create a private /tmp for each container so that it's using the > same space allocated to containers and it's automatically cleaned up as part > of container clean-up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5340) Race condition in RollingLevelDBTimelineStore#getAndSetStartTime()
[ https://issues.apache.org/jira/browse/YARN-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384489#comment-15384489 ] Junping Du commented on YARN-5340: -- I think another side effort of current patch is it force accessing of {{startTimeWriteCache.get(entity);}} has to get a lock first which affects every put entity operations. One way to make lock finer-grained is only put lock on when {{startTimeWriteCache.get(entity);}} doesn't get anything. > Race condition in RollingLevelDBTimelineStore#getAndSetStartTime() > -- > > Key: YARN-5340 > URL: https://issues.apache.org/jira/browse/YARN-5340 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Li Lu >Priority: Critical > Attachments: YARN-5340-trunk.001.patch > > > App Name/User/RPC Port/AM Host info is missing from ATS web service or YARN > CLI's app info > {code} > RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn --config > /tmp/hadoopConf application -status application_1467931619679_0001 > Application Report : > Application-Id : application_1467931619679_0001 > Application-Name : null > Application-Type : null > User : null > Queue : null > Application Priority : null > Start-Time : 0 > Finish-Time : 1467931672057 > Progress : 100% > State : FINISHED > Final-State : SUCCEEDED > Tracking-URL : N/A > RPC Port : -1 > AM Host : N/A > Aggregate Resource Allocation : 290014 MB-seconds, 74 vcore-seconds > Log Aggregation Status : N/A > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384373#comment-15384373 ] Jason Lowe commented on YARN-5382: -- I like the general idea, but I'm not sure a literal move of the audit log to the transition will work. The audit log will try to log the remote IP of the caller, but at the AppKilledTransition we're no longer in an RPC context so there's no remote caller. The basic information is actually in the kill message after YARN-5053, but not in a way that we can pull apart and pass as individual pieces of information to the audit logger (e.g.: user, remote IP, etc.). We could extend the kill event to optionally contain those bits then extend the audit logger API so we can manually specify the user and remote IP rather than have the audit logger always assume it can get them on its own. > RM does not audit log kill request for active applications > -- > > Key: YARN-5382 > URL: https://issues.apache.org/jira/browse/YARN-5382 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Jason Lowe >Assignee: Vrushali C > Attachments: YARN-5382-branch-2.7.01.patch, > YARN-5382-branch-2.7.02.patch > > > ClientRMService will audit a kill request but only if it either fails to > issue the kill or if the kill is sent to an already finished application. It > does not create a log entry when the application is active which is arguably > the most important case to audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5401) yarn application kill does not let mapreduce jobs show up in jobhistory
[ https://issues.apache.org/jira/browse/YARN-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384330#comment-15384330 ] Jason Lowe commented on YARN-5401: -- This is effectively a duplicate of YARN-2261. MapReduce history requires the MapReduce ApplicationMaster to generate the history when it completes. hadoop job -kill or mapred job -kill accomplishes the kill by having the client connect to the MapReduce ApplicationMaster for the job and asks it to kill the job. Since this goes through the ApplicationMaster it allows the history to be generated properly. When the kill is done via YARN then the ApplicationMaster is not involved. The ResourceManager kills the AM without the AM's knowledge. This is similar to kill vs. kill -9 (i.e.: SIGTERM vs SIGKILL) in POSIX. The former allows the application to perform cleanup tasks on the way down, while the latter mercilessly kills the process without any chance for cleanup. Since YARN does not allow the application to specify a cleanup task to be performed when the app dies the MapReduce framework doesn't get a chance to finish generating the history for the job. > yarn application kill does not let mapreduce jobs show up in jobhistory > --- > > Key: YARN-5401 > URL: https://issues.apache.org/jira/browse/YARN-5401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Environment: centos 6.6 > apache hadoop 2.6.4 >Reporter: Nikhil Mulley > > Hi, > Its been found in our cluster running apache hadoop 2.6.4, that while the > mapreduce jobs that are killed with 'hadoop job -kill' command do end up have > the job and its counters to jobhistory server but when 'yarn application > -kill' is used on mapreduce application, job does not show up in jobhistory > server interface. > Is this intentional? If so, any particular reasons? > It would be better to have mapreduce application history reported on > jobhistory irrespective of whether kill is performed using yarn application > cli or hadoop job cli. > thanks, > Nikhil -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5213) Fix a bug in LogCLIHelpers which cause TestLogsCLI#testFetchApplictionLogs fails intermittently
[ https://issues.apache.org/jira/browse/YARN-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384289#comment-15384289 ] Hudson commented on YARN-5213: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10118 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10118/]) YARN-5213. Fix a bug in LogCLIHelpers which cause (junping_du: rev dc2f4b6ac8a6f8848457046cf9e1362d8f48495d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java > Fix a bug in LogCLIHelpers which cause TestLogsCLI#testFetchApplictionLogs > fails intermittently > --- > > Key: YARN-5213 > URL: https://issues.apache.org/jira/browse/YARN-5213 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Xuan Gong > Fix For: 2.9.0 > > Attachments: YARN-5213.1.patch, YARN-5213.2.patch, YARN-5213.patch > > > TestLogsCLI fails intermittently on build > [link|https://builds.apache.org/job/PreCommit-YARN-Build/11910/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt] > {noformat} > Running org.apache.hadoop.yarn.client.cli.TestLogsCLI > Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.708 sec > <<< FAILURE! - in org.apache.hadoop.yarn.client.cli.TestLogsCLI > testFetchApplictionLogs(org.apache.hadoop.yarn.client.cli.TestLogsCLI) Time > elapsed: 0.176 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[Hello]> but was:<[=]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.client.cli.TestLogsCLI.testFetchApplictionLogs(TestLogsCLI.java:389) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5213) Fix a bug in LogCLIHelpers which cause TestLogsCLI#testFetchApplictionLogs fails intermittently
[ https://issues.apache.org/jira/browse/YARN-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-5213: - Summary: Fix a bug in LogCLIHelpers which cause TestLogsCLI#testFetchApplictionLogs fails intermittently (was: TestLogsCLI#testFetchApplictionLogs fails intermittently) > Fix a bug in LogCLIHelpers which cause TestLogsCLI#testFetchApplictionLogs > fails intermittently > --- > > Key: YARN-5213 > URL: https://issues.apache.org/jira/browse/YARN-5213 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Xuan Gong > Attachments: YARN-5213.1.patch, YARN-5213.2.patch, YARN-5213.patch > > > TestLogsCLI fails intermittently on build > [link|https://builds.apache.org/job/PreCommit-YARN-Build/11910/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt] > {noformat} > Running org.apache.hadoop.yarn.client.cli.TestLogsCLI > Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.708 sec > <<< FAILURE! - in org.apache.hadoop.yarn.client.cli.TestLogsCLI > testFetchApplictionLogs(org.apache.hadoop.yarn.client.cli.TestLogsCLI) Time > elapsed: 0.176 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[Hello]> but was:<[=]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.client.cli.TestLogsCLI.testFetchApplictionLogs(TestLogsCLI.java:389) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5403) yarn top command does not execute correct
[ https://issues.apache.org/jira/browse/YARN-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384148#comment-15384148 ] Hadoop QA commented on YARN-5403: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The patch generated 5 new + 152 unchanged - 0 fixed = 157 total (was 152) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 16s {color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 21m 12s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.api.impl.TestYarnClient | | | hadoop.yarn.client.cli.TestLogsCLI | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818813/YARN-5403.patch | | JIRA Issue | YARN-5403 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 7ac530d3bc47 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fe20494 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12372/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/12372/artifact/patchprocess/whitespace-eol.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/12372/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | unit test logs | https://builds.apache.org/job/PreCommit-YARN-Build/12372/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12372/testReport/ | | modules | C:
[jira] [Updated] (YARN-5403) yarn top command does not execute correct
[ https://issues.apache.org/jira/browse/YARN-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi updated YARN-5403: - Attachment: YARN-5403.patch > yarn top command does not execute correct > - > > Key: YARN-5403 > URL: https://issues.apache.org/jira/browse/YARN-5403 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.2 >Reporter: gu-chi > Attachments: YARN-5403.patch > > > when execute {{yarn top}}, I always get exception as below: > {quote} > 16/07/19 19:55:12 ERROR cli.TopCLI: Could not fetch RM start time > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient.(HttpClient.java:211) > at sun.net.www.http.HttpClient.New(HttpClient.java:308) > at sun.net.www.http.HttpClient.New(HttpClient.java:326) > at > sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) > at > sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) > at > org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:747) > at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:443) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:421) > YARN top - 19:55:13, up 17001d, 11:55, 0 active users, queue(s): root > {quote} > As I looked into it, the function {{getRMStartTime}} use HTTP as hardcoding > no matter what is the {{yarn.http.policy}} setting, should consider if use > HTTPS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5403) yarn top command does not execute correct
[ https://issues.apache.org/jira/browse/YARN-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi updated YARN-5403: - Attachment: (was: YARN-5403.patch) > yarn top command does not execute correct > - > > Key: YARN-5403 > URL: https://issues.apache.org/jira/browse/YARN-5403 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.2 >Reporter: gu-chi > > when execute {{yarn top}}, I always get exception as below: > {quote} > 16/07/19 19:55:12 ERROR cli.TopCLI: Could not fetch RM start time > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient.(HttpClient.java:211) > at sun.net.www.http.HttpClient.New(HttpClient.java:308) > at sun.net.www.http.HttpClient.New(HttpClient.java:326) > at > sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) > at > sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) > at > org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:747) > at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:443) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:421) > YARN top - 19:55:13, up 17001d, 11:55, 0 active users, queue(s): root > {quote} > As I looked into it, the function {{getRMStartTime}} use HTTP as hardcoding > no matter what is the {{yarn.http.policy}} setting, should consider if use > HTTPS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5403) yarn top command does not execute correct
[ https://issues.apache.org/jira/browse/YARN-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384066#comment-15384066 ] Hadoop QA commented on YARN-5403: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} YARN-5403 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818806/YARN-5403.patch | | JIRA Issue | YARN-5403 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12371/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > yarn top command does not execute correct > - > > Key: YARN-5403 > URL: https://issues.apache.org/jira/browse/YARN-5403 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.2 >Reporter: gu-chi > Attachments: YARN-5403.patch > > > when execute {{yarn top}}, I always get exception as below: > {quote} > 16/07/19 19:55:12 ERROR cli.TopCLI: Could not fetch RM start time > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient.(HttpClient.java:211) > at sun.net.www.http.HttpClient.New(HttpClient.java:308) > at sun.net.www.http.HttpClient.New(HttpClient.java:326) > at > sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) > at > sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) > at > org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:747) > at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:443) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:421) > YARN top - 19:55:13, up 17001d, 11:55, 0 active users, queue(s): root > {quote} > As I looked into it, the function {{getRMStartTime}} use HTTP as hardcoding > no matter what is the {{yarn.http.policy}} setting, should consider if use > HTTPS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5403) yarn top command does not execute correct
[ https://issues.apache.org/jira/browse/YARN-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gu-chi updated YARN-5403: - Attachment: YARN-5403.patch > yarn top command does not execute correct > - > > Key: YARN-5403 > URL: https://issues.apache.org/jira/browse/YARN-5403 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.2 >Reporter: gu-chi > Attachments: YARN-5403.patch > > > when execute {{yarn top}}, I always get exception as below: > {quote} > 16/07/19 19:55:12 ERROR cli.TopCLI: Could not fetch RM start time > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at java.net.Socket.connect(Socket.java:538) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient.(HttpClient.java:211) > at sun.net.www.http.HttpClient.New(HttpClient.java:308) > at sun.net.www.http.HttpClient.New(HttpClient.java:326) > at > sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) > at > sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) > at > org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:747) > at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:443) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:421) > YARN top - 19:55:13, up 17001d, 11:55, 0 active users, queue(s): root > {quote} > As I looked into it, the function {{getRMStartTime}} use HTTP as hardcoding > no matter what is the {{yarn.http.policy}} setting, should consider if use > HTTPS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5403) yarn top command does not execute correct
gu-chi created YARN-5403: Summary: yarn top command does not execute correct Key: YARN-5403 URL: https://issues.apache.org/jira/browse/YARN-5403 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.2 Reporter: gu-chi when execute {{yarn top}}, I always get exception as below: {quote} 16/07/19 19:55:12 ERROR cli.TopCLI: Could not fetch RM start time java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at java.net.Socket.connect(Socket.java:538) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) at org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:747) at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:443) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:421) YARN top - 19:55:13, up 17001d, 11:55, 0 active users, queue(s): root {quote} As I looked into it, the function {{getRMStartTime}} use HTTP as hardcoding no matter what is the {{yarn.http.policy}} setting, should consider if use HTTPS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3477) TimelineClientImpl swallows exceptions
[ https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384024#comment-15384024 ] Steve Loughran commented on YARN-3477: -- It would have really nice if this patch could have been reviewed and committed when it actually compiled against the code as it is, it'll be a significant piece of work to try and merge in. I don't have the time this week, and am off on vacation from friday. Can you look at the core changes: the exception logging, wrapping, {{nterruptedIOException}} rethrowing, and replicate that in your ongoing work? That's all that matters. I'll watch your JIRA and review it. > TimelineClientImpl swallows exceptions > -- > > Key: YARN-3477 > URL: https://issues.apache.org/jira/browse/YARN-3477 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0, 2.7.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-3477-001.patch, YARN-3477-002.patch > > > If timeline client fails more than the retry count, the original exception is > not thrown. Instead some runtime exception is raised saying "retries run out" > # the failing exception should be rethrown, ideally via > NetUtils.wrapException to include URL of the failing endpoing > # Otherwise, the raised RTE should (a) state that URL and (b) set the > original fault as the inner cause -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5309) Fix SSLFactory truststore reloader thread leak in TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384016#comment-15384016 ] Hadoop QA commented on YARN-5309: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 35s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} branch-2.8 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} branch-2.8 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 0 new + 21 unchanged - 2 fixed = 21 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 34s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 36m 33s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:5af2af1 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818795/YARN-5309.branch-2.8.001.patch | | JIRA Issue | YARN-5309 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux 2623ade54897
[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service
[ https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384009#comment-15384009 ] Hadoop QA commented on YARN-679: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 1s {color} | {color:green} The patch appears to include 22 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 7m 17s {color} | {color:red} root generated 8 new + 709 unchanged - 0 fixed = 717 total (was 709) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} hadoop-common-project/hadoop-common: The patch generated 143 new + 119 unchanged - 34 fixed = 262 total (was 153) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 73 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 17s {color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 39m 29s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818791/YARN-679-009.patch | | JIRA Issue | YARN-679 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 1190dfe4fb46 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fe20494 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/12368/artifact/patchprocess/diff-compile-javac-root.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/12368/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/12368/artifact/patchprocess/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12368/testReport/ | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12368/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > add an entry point that can start any Yarn service >
[jira] [Commented] (YARN-5309) Fix SSLFactory truststore reloader thread leak in TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384008#comment-15384008 ] Hadoop QA commented on YARN-5309: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 2s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} branch-2.8 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} branch-2.8 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} branch-2.8 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 0 new + 21 unchanged - 2 fixed = 21 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 23s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 44s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:5af2af1 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818795/YARN-5309.branch-2.8.001.patch | | JIRA Issue | YARN-5309 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux cdbcdc5c7f80
[jira] [Updated] (YARN-5309) Fix SSLFactory truststore reloader thread leak in TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-5309: -- Attachment: (was: YARN-5309.branch-2.8.001.patch) > Fix SSLFactory truststore reloader thread leak in TimelineClientImpl > > > Key: YARN-5309 > URL: https://issues.apache.org/jira/browse/YARN-5309 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver, yarn >Affects Versions: 2.7.1 >Reporter: Thomas Friedrich >Assignee: Weiwei Yang >Priority: Blocker > Attachments: YARN-5309.001.patch, YARN-5309.002.patch, > YARN-5309.003.patch, YARN-5309.004.patch, YARN-5309.005.patch, > YARN-5309.branch-2.7.3.001.patch, YARN-5309.branch-2.8.001.patch > > > We found a similar issue as HADOOP-11368 in TimelineClientImpl. The class > creates an instance of SSLFactory in newSslConnConfigurator and subsequently > creates the ReloadingX509TrustManager instance which in turn starts a trust > store reloader thread. > However, the SSLFactory is never destroyed and hence the trust store reloader > threads are not killed. > This problem was observed by a customer who had SSL enabled in Hadoop and > submitted many queries against the HiveServer2. After a few days, the HS2 > instance crashed and from the Java dump we could see many (over 13000) > threads like this: > "Truststore reloader thread" #126 daemon prio=5 os_prio=0 > tid=0x7f680d2e3000 nid=0x98fd waiting on > condition [0x7f67e482c000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run > (ReloadingX509TrustManager.java:225) > at java.lang.Thread.run(Thread.java:745) > HiveServer2 uses the JobClient to submit a job: > Thread [HiveServer2-Background-Pool: Thread-188] (Suspended (breakpoint at > line 89 in > ReloadingX509TrustManager)) > owns: Object (id=464) > owns: Object (id=465) > owns: Object (id=466) > owns: ServiceLoader (id=210) > ReloadingX509TrustManager.(String, String, String, long) line: 89 > FileBasedKeyStoresFactory.init(SSLFactory$Mode) line: 209 > SSLFactory.init() line: 131 > TimelineClientImpl.newSslConnConfigurator(int, Configuration) line: 532 > TimelineClientImpl.newConnConfigurator(Configuration) line: 507 > TimelineClientImpl.serviceInit(Configuration) line: 269 > TimelineClientImpl(AbstractService).init(Configuration) line: 163 > YarnClientImpl.serviceInit(Configuration) line: 169 > YarnClientImpl(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.serviceInit(Configuration) line: 102 > ResourceMgrDelegate(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.(YarnConfiguration) line: 96 > YARNRunner.(Configuration) line: 112 > YarnClientProtocolProvider.create(Configuration) line: 34 > Cluster.initialize(InetSocketAddress, Configuration) line: 95 > Cluster.(InetSocketAddress, Configuration) line: 82 > Cluster.(Configuration) line: 75 > JobClient.init(JobConf) line: 475 > JobClient.(JobConf) line: 454 > MapRedTask(ExecDriver).execute(DriverContext) line: 401 > MapRedTask.execute(DriverContext) line: 137 > MapRedTask(Task).executeTask() line: 160 > TaskRunner.runSequential() line: 88 > Driver.launchTask(Task, String, boolean, String, int, > DriverContext) line: 1653 > Driver.execute() line: 1412 > For every job, a new instance of JobClient/YarnClientImpl/TimelineClientImpl > is created. But because the HS2 process stays up for days, the previous trust > store reloader threads are still hanging around in the HS2 process and > eventually use all the resources available. > It seems like a similar fix as HADOOP-11368 is needed in TimelineClientImpl > but it doesn't have a destroy method to begin with. > One option to avoid this problem is to disable the yarn timeline service > (yarn.timeline-service.enabled=false). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5309) Fix SSLFactory truststore reloader thread leak in TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-5309: -- Attachment: YARN-5309.branch-2.8.001.patch > Fix SSLFactory truststore reloader thread leak in TimelineClientImpl > > > Key: YARN-5309 > URL: https://issues.apache.org/jira/browse/YARN-5309 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver, yarn >Affects Versions: 2.7.1 >Reporter: Thomas Friedrich >Assignee: Weiwei Yang >Priority: Blocker > Attachments: YARN-5309.001.patch, YARN-5309.002.patch, > YARN-5309.003.patch, YARN-5309.004.patch, YARN-5309.005.patch, > YARN-5309.branch-2.7.3.001.patch, YARN-5309.branch-2.8.001.patch > > > We found a similar issue as HADOOP-11368 in TimelineClientImpl. The class > creates an instance of SSLFactory in newSslConnConfigurator and subsequently > creates the ReloadingX509TrustManager instance which in turn starts a trust > store reloader thread. > However, the SSLFactory is never destroyed and hence the trust store reloader > threads are not killed. > This problem was observed by a customer who had SSL enabled in Hadoop and > submitted many queries against the HiveServer2. After a few days, the HS2 > instance crashed and from the Java dump we could see many (over 13000) > threads like this: > "Truststore reloader thread" #126 daemon prio=5 os_prio=0 > tid=0x7f680d2e3000 nid=0x98fd waiting on > condition [0x7f67e482c000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run > (ReloadingX509TrustManager.java:225) > at java.lang.Thread.run(Thread.java:745) > HiveServer2 uses the JobClient to submit a job: > Thread [HiveServer2-Background-Pool: Thread-188] (Suspended (breakpoint at > line 89 in > ReloadingX509TrustManager)) > owns: Object (id=464) > owns: Object (id=465) > owns: Object (id=466) > owns: ServiceLoader (id=210) > ReloadingX509TrustManager.(String, String, String, long) line: 89 > FileBasedKeyStoresFactory.init(SSLFactory$Mode) line: 209 > SSLFactory.init() line: 131 > TimelineClientImpl.newSslConnConfigurator(int, Configuration) line: 532 > TimelineClientImpl.newConnConfigurator(Configuration) line: 507 > TimelineClientImpl.serviceInit(Configuration) line: 269 > TimelineClientImpl(AbstractService).init(Configuration) line: 163 > YarnClientImpl.serviceInit(Configuration) line: 169 > YarnClientImpl(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.serviceInit(Configuration) line: 102 > ResourceMgrDelegate(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.(YarnConfiguration) line: 96 > YARNRunner.(Configuration) line: 112 > YarnClientProtocolProvider.create(Configuration) line: 34 > Cluster.initialize(InetSocketAddress, Configuration) line: 95 > Cluster.(InetSocketAddress, Configuration) line: 82 > Cluster.(Configuration) line: 75 > JobClient.init(JobConf) line: 475 > JobClient.(JobConf) line: 454 > MapRedTask(ExecDriver).execute(DriverContext) line: 401 > MapRedTask.execute(DriverContext) line: 137 > MapRedTask(Task).executeTask() line: 160 > TaskRunner.runSequential() line: 88 > Driver.launchTask(Task, String, boolean, String, int, > DriverContext) line: 1653 > Driver.execute() line: 1412 > For every job, a new instance of JobClient/YarnClientImpl/TimelineClientImpl > is created. But because the HS2 process stays up for days, the previous trust > store reloader threads are still hanging around in the HS2 process and > eventually use all the resources available. > It seems like a similar fix as HADOOP-11368 is needed in TimelineClientImpl > but it doesn't have a destroy method to begin with. > One option to avoid this problem is to disable the yarn timeline service > (yarn.timeline-service.enabled=false). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5309) Fix SSLFactory truststore reloader thread leak in TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-5309: -- Attachment: YARN-5309.branch-2.8.001.patch YARN-5309.branch-2.7.3.001.patch Hello [~vvasudev] Attached a patch for branch-2.7.3 and 2.8. Please check. Thanks for all the help. > Fix SSLFactory truststore reloader thread leak in TimelineClientImpl > > > Key: YARN-5309 > URL: https://issues.apache.org/jira/browse/YARN-5309 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver, yarn >Affects Versions: 2.7.1 >Reporter: Thomas Friedrich >Assignee: Weiwei Yang >Priority: Blocker > Attachments: YARN-5309.001.patch, YARN-5309.002.patch, > YARN-5309.003.patch, YARN-5309.004.patch, YARN-5309.005.patch, > YARN-5309.branch-2.7.3.001.patch, YARN-5309.branch-2.8.001.patch > > > We found a similar issue as HADOOP-11368 in TimelineClientImpl. The class > creates an instance of SSLFactory in newSslConnConfigurator and subsequently > creates the ReloadingX509TrustManager instance which in turn starts a trust > store reloader thread. > However, the SSLFactory is never destroyed and hence the trust store reloader > threads are not killed. > This problem was observed by a customer who had SSL enabled in Hadoop and > submitted many queries against the HiveServer2. After a few days, the HS2 > instance crashed and from the Java dump we could see many (over 13000) > threads like this: > "Truststore reloader thread" #126 daemon prio=5 os_prio=0 > tid=0x7f680d2e3000 nid=0x98fd waiting on > condition [0x7f67e482c000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run > (ReloadingX509TrustManager.java:225) > at java.lang.Thread.run(Thread.java:745) > HiveServer2 uses the JobClient to submit a job: > Thread [HiveServer2-Background-Pool: Thread-188] (Suspended (breakpoint at > line 89 in > ReloadingX509TrustManager)) > owns: Object (id=464) > owns: Object (id=465) > owns: Object (id=466) > owns: ServiceLoader (id=210) > ReloadingX509TrustManager.(String, String, String, long) line: 89 > FileBasedKeyStoresFactory.init(SSLFactory$Mode) line: 209 > SSLFactory.init() line: 131 > TimelineClientImpl.newSslConnConfigurator(int, Configuration) line: 532 > TimelineClientImpl.newConnConfigurator(Configuration) line: 507 > TimelineClientImpl.serviceInit(Configuration) line: 269 > TimelineClientImpl(AbstractService).init(Configuration) line: 163 > YarnClientImpl.serviceInit(Configuration) line: 169 > YarnClientImpl(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.serviceInit(Configuration) line: 102 > ResourceMgrDelegate(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.(YarnConfiguration) line: 96 > YARNRunner.(Configuration) line: 112 > YarnClientProtocolProvider.create(Configuration) line: 34 > Cluster.initialize(InetSocketAddress, Configuration) line: 95 > Cluster.(InetSocketAddress, Configuration) line: 82 > Cluster.(Configuration) line: 75 > JobClient.init(JobConf) line: 475 > JobClient.(JobConf) line: 454 > MapRedTask(ExecDriver).execute(DriverContext) line: 401 > MapRedTask.execute(DriverContext) line: 137 > MapRedTask(Task).executeTask() line: 160 > TaskRunner.runSequential() line: 88 > Driver.launchTask(Task, String, boolean, String, int, > DriverContext) line: 1653 > Driver.execute() line: 1412 > For every job, a new instance of JobClient/YarnClientImpl/TimelineClientImpl > is created. But because the HS2 process stays up for days, the previous trust > store reloader threads are still hanging around in the HS2 process and > eventually use all the resources available. > It seems like a similar fix as HADOOP-11368 is needed in TimelineClientImpl > but it doesn't have a destroy method to begin with. > One option to avoid this problem is to disable the yarn timeline service > (yarn.timeline-service.enabled=false). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4996) Make TestNMReconnect.testCompareRMNodeAfterReconnect() scheduler agnostic, or better yet parameterized
[ https://issues.apache.org/jira/browse/YARN-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383951#comment-15383951 ] Hudson commented on YARN-4996: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10117 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10117/]) YARN-4996. Make TestNMReconnect.testCompareRMNodeAfterReconnect() (varunsaxena: rev fe20494a728836c974a4cfa062e1802583fdc934) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ParameterizedSchedulerTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java > Make TestNMReconnect.testCompareRMNodeAfterReconnect() scheduler agnostic, or > better yet parameterized > -- > > Key: YARN-4996 > URL: https://issues.apache.org/jira/browse/YARN-4996 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, test >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Kai Sasaki >Priority: Minor > Labels: newbie > Fix For: 2.9.0 > > Attachments: YARN-4996.01.patch, YARN-4996.02.patch, > YARN-4996.03.patch, YARN-4996.04.patch, YARN-4996.05.patch, > YARN-4996.06.patch, YARN-4996.07.patch, YARN-4996.08.patch > > > The test tests only the capacity scheduler. It should also test fair > scheduler. At a bare minimum, it should use the default scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-679) add an entry point that can start any Yarn service
[ https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-679: Attachment: YARN-679-009.patch Patch 009, synced up with trunk HADOOP-13179 broke things, as it made the building of common options private/static, not, as I needed for some things, subclassable. I fixed by making that protected and synchronized on {{OptionBuilder}}, which is what everything should do. [~aw] I've edited references to the PR in the JIRA. Will patches now take, or is there some secret DB I need to alter? > add an entry point that can start any Yarn service > -- > > Key: YARN-679 > URL: https://issues.apache.org/jira/browse/YARN-679 > Project: Hadoop YARN > Issue Type: New Feature > Components: api >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: YARN-679-001.patch, YARN-679-002.patch, > YARN-679-002.patch, YARN-679-003.patch, YARN-679-004.patch, > YARN-679-005.patch, YARN-679-006.patch, YARN-679-007.patch, > YARN-679-008.patch, YARN-679-009.patch, org.apache.hadoop.servic...mon > 3.0.0-SNAPSHOT API).pdf > > Time Spent: 72h > Remaining Estimate: 0h > > There's no need to write separate .main classes for every Yarn service, given > that the startup mechanism should be identical: create, init, start, wait for > stopped -with an interrupt handler to trigger a clean shutdown on a control-c > interrupt. > Provide one that takes any classname, and a list of config files/options -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4996) Make TestNMReconnect.testCompareRMNodeAfterReconnect() scheduler agnostic, or better yet parameterized
[ https://issues.apache.org/jira/browse/YARN-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383932#comment-15383932 ] Kai Sasaki commented on YARN-4996: -- [~varun_saxena] Thank you so much! > Make TestNMReconnect.testCompareRMNodeAfterReconnect() scheduler agnostic, or > better yet parameterized > -- > > Key: YARN-4996 > URL: https://issues.apache.org/jira/browse/YARN-4996 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, test >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Kai Sasaki >Priority: Minor > Labels: newbie > Fix For: 2.9.0 > > Attachments: YARN-4996.01.patch, YARN-4996.02.patch, > YARN-4996.03.patch, YARN-4996.04.patch, YARN-4996.05.patch, > YARN-4996.06.patch, YARN-4996.07.patch, YARN-4996.08.patch > > > The test tests only the capacity scheduler. It should also test fair > scheduler. At a bare minimum, it should use the default scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4996) Make TestNMReconnect.testCompareRMNodeAfterReconnect() scheduler agnostic, or better yet parameterized
[ https://issues.apache.org/jira/browse/YARN-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383927#comment-15383927 ] Varun Saxena commented on YARN-4996: Committed the latest patch to trunk, branch-2. Thanks [~lewuathe] for your contribution and [~templedf] for the reviews. > Make TestNMReconnect.testCompareRMNodeAfterReconnect() scheduler agnostic, or > better yet parameterized > -- > > Key: YARN-4996 > URL: https://issues.apache.org/jira/browse/YARN-4996 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, test >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Kai Sasaki >Priority: Minor > Labels: newbie > Fix For: 2.9.0 > > Attachments: YARN-4996.01.patch, YARN-4996.02.patch, > YARN-4996.03.patch, YARN-4996.04.patch, YARN-4996.05.patch, > YARN-4996.06.patch, YARN-4996.07.patch, YARN-4996.08.patch > > > The test tests only the capacity scheduler. It should also test fair > scheduler. At a bare minimum, it should use the default scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5309) Fix SSLFactory truststore reloader thread leak in TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383755#comment-15383755 ] Varun Vasudev commented on YARN-5309: - [~cheersyang] - the patch doesn't apply cleanly on branch-2.7. Can you please add a patch for branch-2.7 if you need this to go into 2.7.3? Thanks! > Fix SSLFactory truststore reloader thread leak in TimelineClientImpl > > > Key: YARN-5309 > URL: https://issues.apache.org/jira/browse/YARN-5309 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver, yarn >Affects Versions: 2.7.1 >Reporter: Thomas Friedrich >Assignee: Weiwei Yang >Priority: Blocker > Attachments: YARN-5309.001.patch, YARN-5309.002.patch, > YARN-5309.003.patch, YARN-5309.004.patch, YARN-5309.005.patch > > > We found a similar issue as HADOOP-11368 in TimelineClientImpl. The class > creates an instance of SSLFactory in newSslConnConfigurator and subsequently > creates the ReloadingX509TrustManager instance which in turn starts a trust > store reloader thread. > However, the SSLFactory is never destroyed and hence the trust store reloader > threads are not killed. > This problem was observed by a customer who had SSL enabled in Hadoop and > submitted many queries against the HiveServer2. After a few days, the HS2 > instance crashed and from the Java dump we could see many (over 13000) > threads like this: > "Truststore reloader thread" #126 daemon prio=5 os_prio=0 > tid=0x7f680d2e3000 nid=0x98fd waiting on > condition [0x7f67e482c000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run > (ReloadingX509TrustManager.java:225) > at java.lang.Thread.run(Thread.java:745) > HiveServer2 uses the JobClient to submit a job: > Thread [HiveServer2-Background-Pool: Thread-188] (Suspended (breakpoint at > line 89 in > ReloadingX509TrustManager)) > owns: Object (id=464) > owns: Object (id=465) > owns: Object (id=466) > owns: ServiceLoader (id=210) > ReloadingX509TrustManager.(String, String, String, long) line: 89 > FileBasedKeyStoresFactory.init(SSLFactory$Mode) line: 209 > SSLFactory.init() line: 131 > TimelineClientImpl.newSslConnConfigurator(int, Configuration) line: 532 > TimelineClientImpl.newConnConfigurator(Configuration) line: 507 > TimelineClientImpl.serviceInit(Configuration) line: 269 > TimelineClientImpl(AbstractService).init(Configuration) line: 163 > YarnClientImpl.serviceInit(Configuration) line: 169 > YarnClientImpl(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.serviceInit(Configuration) line: 102 > ResourceMgrDelegate(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.(YarnConfiguration) line: 96 > YARNRunner.(Configuration) line: 112 > YarnClientProtocolProvider.create(Configuration) line: 34 > Cluster.initialize(InetSocketAddress, Configuration) line: 95 > Cluster.(InetSocketAddress, Configuration) line: 82 > Cluster.(Configuration) line: 75 > JobClient.init(JobConf) line: 475 > JobClient.(JobConf) line: 454 > MapRedTask(ExecDriver).execute(DriverContext) line: 401 > MapRedTask.execute(DriverContext) line: 137 > MapRedTask(Task).executeTask() line: 160 > TaskRunner.runSequential() line: 88 > Driver.launchTask(Task, String, boolean, String, int, > DriverContext) line: 1653 > Driver.execute() line: 1412 > For every job, a new instance of JobClient/YarnClientImpl/TimelineClientImpl > is created. But because the HS2 process stays up for days, the previous trust > store reloader threads are still hanging around in the HS2 process and > eventually use all the resources available. > It seems like a similar fix as HADOOP-11368 is needed in TimelineClientImpl > but it doesn't have a destroy method to begin with. > One option to avoid this problem is to disable the yarn timeline service > (yarn.timeline-service.enabled=false). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5309) Fix SSLFactory truststore reloader thread leak in TimelineClientImpl
[ https://issues.apache.org/jira/browse/YARN-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-5309: Summary: Fix SSLFactory truststore reloader thread leak in TimelineClientImpl (was: SSLFactory truststore reloader thread leak in TimelineClientImpl) > Fix SSLFactory truststore reloader thread leak in TimelineClientImpl > > > Key: YARN-5309 > URL: https://issues.apache.org/jira/browse/YARN-5309 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver, yarn >Affects Versions: 2.7.1 >Reporter: Thomas Friedrich >Assignee: Weiwei Yang >Priority: Blocker > Attachments: YARN-5309.001.patch, YARN-5309.002.patch, > YARN-5309.003.patch, YARN-5309.004.patch, YARN-5309.005.patch > > > We found a similar issue as HADOOP-11368 in TimelineClientImpl. The class > creates an instance of SSLFactory in newSslConnConfigurator and subsequently > creates the ReloadingX509TrustManager instance which in turn starts a trust > store reloader thread. > However, the SSLFactory is never destroyed and hence the trust store reloader > threads are not killed. > This problem was observed by a customer who had SSL enabled in Hadoop and > submitted many queries against the HiveServer2. After a few days, the HS2 > instance crashed and from the Java dump we could see many (over 13000) > threads like this: > "Truststore reloader thread" #126 daemon prio=5 os_prio=0 > tid=0x7f680d2e3000 nid=0x98fd waiting on > condition [0x7f67e482c000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run > (ReloadingX509TrustManager.java:225) > at java.lang.Thread.run(Thread.java:745) > HiveServer2 uses the JobClient to submit a job: > Thread [HiveServer2-Background-Pool: Thread-188] (Suspended (breakpoint at > line 89 in > ReloadingX509TrustManager)) > owns: Object (id=464) > owns: Object (id=465) > owns: Object (id=466) > owns: ServiceLoader (id=210) > ReloadingX509TrustManager.(String, String, String, long) line: 89 > FileBasedKeyStoresFactory.init(SSLFactory$Mode) line: 209 > SSLFactory.init() line: 131 > TimelineClientImpl.newSslConnConfigurator(int, Configuration) line: 532 > TimelineClientImpl.newConnConfigurator(Configuration) line: 507 > TimelineClientImpl.serviceInit(Configuration) line: 269 > TimelineClientImpl(AbstractService).init(Configuration) line: 163 > YarnClientImpl.serviceInit(Configuration) line: 169 > YarnClientImpl(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.serviceInit(Configuration) line: 102 > ResourceMgrDelegate(AbstractService).init(Configuration) line: 163 > ResourceMgrDelegate.(YarnConfiguration) line: 96 > YARNRunner.(Configuration) line: 112 > YarnClientProtocolProvider.create(Configuration) line: 34 > Cluster.initialize(InetSocketAddress, Configuration) line: 95 > Cluster.(InetSocketAddress, Configuration) line: 82 > Cluster.(Configuration) line: 75 > JobClient.init(JobConf) line: 475 > JobClient.(JobConf) line: 454 > MapRedTask(ExecDriver).execute(DriverContext) line: 401 > MapRedTask.execute(DriverContext) line: 137 > MapRedTask(Task).executeTask() line: 160 > TaskRunner.runSequential() line: 88 > Driver.launchTask(Task, String, boolean, String, int, > DriverContext) line: 1653 > Driver.execute() line: 1412 > For every job, a new instance of JobClient/YarnClientImpl/TimelineClientImpl > is created. But because the HS2 process stays up for days, the previous trust > store reloader threads are still hanging around in the HS2 process and > eventually use all the resources available. > It seems like a similar fix as HADOOP-11368 is needed in TimelineClientImpl > but it doesn't have a destroy method to begin with. > One option to avoid this problem is to disable the yarn timeline service > (yarn.timeline-service.enabled=false). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4996) Make TestNMReconnect.testCompareRMNodeAfterReconnect() scheduler agnostic, or better yet parameterized
[ https://issues.apache.org/jira/browse/YARN-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383632#comment-15383632 ] Hadoop QA commented on YARN-4996: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 10 unchanged - 1 fixed = 10 total (was 11) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 36m 44s {color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 20s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12818742/YARN-4996.08.patch | | JIRA Issue | YARN-4996 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 0e8ad5850526 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 92fe2db | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/12367/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/12367/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Make TestNMReconnect.testCompareRMNodeAfterReconnect() scheduler agnostic, or > better yet parameterized > -- > > Key: YARN-4996 > URL: https://issues.apache.org/jira/browse/YARN-4996 > Project: Hadoop YARN > Issue Type: