[jira] [Comment Edited] (YARN-5597) YARN Federation improvements
[ https://issues.apache.org/jira/browse/YARN-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606719#comment-16606719 ] Bibin A Chundatt edited comment on YARN-5597 at 9/7/18 5:43 AM: [~subru]/[~elgoiri] {quote} We use the same ZK ensemble and connection string so we are not having issues here. {quote} In HA subcluster with same Zk for leader election,RM store and Federation Store + kerberos shouldn't have any issue as per understanding. But above topology could be load on Zk, since all subcluster RMs will write Store to single ZK ensemble. Why not have separate conf for FederationStore connection string? Mysql seems best fit now if zk security is required. What is the clean up strategy for metadata ?. In Federation store its not required to keep the apps list(router mapping to app) once the apps is flushed out from RM memory/store rt ?? was (Author: bibinchundatt): [~subru]/[~elgoiri] {quote} We use the same ZK ensemble and connection string so we are not having issues here. {quote} In HA subcluster with same Zk for, leader election,RM store and Federation Store + kerberos shouldn't have any issue as per understanding. Above topology could be load on Zk, since all subclusters will write Store to single ZK ensemble. Why not have separate conf for FederaionStore connection string? Mysql seems best fit now if zk security is required. What is the clean up strategy for metadata ?. In Federation store its not required to keep the apps list(router mapping to app) once the apps is flushed out from RM memory/store rt ?? > YARN Federation improvements > > > Key: YARN-5597 > URL: https://issues.apache.org/jira/browse/YARN-5597 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Major > > This umbrella JIRA tracks set of improvements over the YARN Federation MVP > (YARN-2915) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5597) YARN Federation improvements
[ https://issues.apache.org/jira/browse/YARN-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606719#comment-16606719 ] Bibin A Chundatt commented on YARN-5597: [~subru]/[~elgoiri] {quote} We use the same ZK ensemble and connection string so we are not having issues here. {quote} In HA subcluster with same Zk for, leader election,RM store and Federation Store + kerberos shouldn't have any issue as per understanding. Above topology could be load on Zk, since all subclusters will write Store to single ZK ensemble. Why not have separate conf for FederaionStore connection string? Mysql seems best fit now if zk security is required. What is the clean up strategy for metadata ?. In Federation store its not required to keep the apps list(router mapping to app) once the apps is flushed out from RM memory/store rt ?? > YARN Federation improvements > > > Key: YARN-5597 > URL: https://issues.apache.org/jira/browse/YARN-5597 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Major > > This umbrella JIRA tracks set of improvements over the YARN Federation MVP > (YARN-2915) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606694#comment-16606694 ] Hadoop QA commented on YARN-8699: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 24s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 37s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 51s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 2s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 54s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 6s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 57s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 57s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}135m 48s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8699 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938748/YARN-8699.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 36dbbdeb2b15 3.13.0-153-generic
[jira] [Commented] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived
[ https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606686#comment-16606686 ] Hadoop QA commented on YARN-8752: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 30m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 45m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8752 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938754/YARN-8752-1.patch | | Optional Tests | dupname asflicense mvnsite | | uname | Linux 967f155e9989 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 396ce7b | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 302 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21783/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > yarn-registry.md has wrong word ong-lived,it should be long-lived > - > > Key: YARN-8752 > URL: https://issues.apache.org/jira/browse/YARN-8752 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.0 >Reporter: leiqiang >Priority: Major > Labels: documentation > Attachments: YARN-8752-1.patch > > > In yarn-registry.md line 88, > deploy {color:#FF}ong-lived{color} services instances, this word should > be {color:#FF}long-lived{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8718) Merge related work for YARN-3409
[ https://issues.apache.org/jira/browse/YARN-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606673#comment-16606673 ] Sunil Govindan commented on YARN-8718: -- # ASF license issues are not due the patch. Its on MR code which this branch code didn't make any change to. # Test case failures are not related to branch code. TestTimelineClientV2Impl#testSyncCall will be tracked in separate Jira against trunk. # whitespace issue is in /bin/yarn. We added nodeattributes command line, and this is same like other command. Hence skipping this. # checkstyle errors are fixed as possible. Current issues are mostly on making existing method length under 150 line, access methods for private/protected etc. > Merge related work for YARN-3409 > > > Key: YARN-8718 > URL: https://issues.apache.org/jira/browse/YARN-8718 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Priority: Major > Attachments: YARN-3409.001.patch, YARN-3409.002.patch, > YARN-8718.003.patch, YARN-8718.004.patch, YARN-8718.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8717) set memory.limit_in_bytes when NodeManager starting
[ https://issues.apache.org/jira/browse/YARN-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594858#comment-16594858 ] Jiandan Yang edited comment on YARN-8717 at 9/7/18 3:05 AM: - Hi [~cheersyang] Thanks for watching. We found NM was killed by OOM-killer. conditions are as follows: ``` yarn.nodemanager.resource.memory.enforced=false yarn.nodemanager.resource.memory-mb = 100G Physical Memory of NM machine is 120G NM has two container, each requests 40G memory, but actual each request 50G+ ``` So we thought setting limit on the hireachy of hadoop-yarn was (Author: yangjiandan): Hi [~cheersyang] Thanks for watching. We found NM was killed by OOM-killer. conditions are as follows: ``` yarn.nodemanager.resource.memory.enabled=false yarn.nodemanager.resource.memory-mb = 100G Physical Memory of NM machine is 120G NM has two container, each requests 40G memory, but actual each request 50G+ ``` So we thought setting limit on the hireachy of hadoop-yarn > set memory.limit_in_bytes when NodeManager starting > --- > > Key: YARN-8717 > URL: https://issues.apache.org/jira/browse/YARN-8717 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jiandan Yang >Assignee: Jiandan Yang >Priority: Major > Labels: cgroups > Attachments: YARN-8717.001.patch > > > CGroupsCpuResourceHandlerImpl sets cpu quota at hirarchy of hadoop-yarn to > restrict total resource of cpu of NM when NM starting; > CGroupsMemoryResourceHandlerImpl also should set memory.limit_in_bytes at > hirachy of hadoop-yarn to control memory resource of NM -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived
[ https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leiqiang updated YARN-8752: --- Attachment: (was: YARN-8752-1.patch) > yarn-registry.md has wrong word ong-lived,it should be long-lived > - > > Key: YARN-8752 > URL: https://issues.apache.org/jira/browse/YARN-8752 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.0 >Reporter: leiqiang >Priority: Major > Labels: documentation > > In yarn-registry.md line 88, > deploy {color:#FF}ong-lived{color} services instances, this word should > be {color:#FF}long-lived{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived
[ https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leiqiang updated YARN-8752: --- Attachment: YARN-8752-1.patch > yarn-registry.md has wrong word ong-lived,it should be long-lived > - > > Key: YARN-8752 > URL: https://issues.apache.org/jira/browse/YARN-8752 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.0 >Reporter: leiqiang >Priority: Major > Labels: documentation > Attachments: YARN-8752-1.patch > > > In yarn-registry.md line 88, > deploy {color:#FF}ong-lived{color} services instances, this word should > be {color:#FF}long-lived{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived
leiqiang created YARN-8752: -- Summary: yarn-registry.md has wrong word ong-lived,it should be long-lived Key: YARN-8752 URL: https://issues.apache.org/jira/browse/YARN-8752 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 3.1.0 Reporter: leiqiang In yarn-registry.md line 88, deploy {color:#FF}ong-lived{color} services instances, this word should be {color:#FF}long-lived{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606627#comment-16606627 ] Bibin A Chundatt edited comment on YARN-8699 at 9/7/18 1:59 AM: Thank you [~giovanni.fumarola] for review Attached patch handling typo fix too. was (Author: bibinchundatt): [~giovanni.fumarola] Attached patch handling typo fix too. > Add Yarnclient#yarnclusterMetrics API implementation in router > -- > > Key: YARN-8699 > URL: https://issues.apache.org/jira/browse/YARN-8699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8699.001.patch, YARN-8699.002.patch, > YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch > > > Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8699: --- Attachment: YARN-8699.005.patch > Add Yarnclient#yarnclusterMetrics API implementation in router > -- > > Key: YARN-8699 > URL: https://issues.apache.org/jira/browse/YARN-8699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8699.001.patch, YARN-8699.002.patch, > YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch > > > Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606627#comment-16606627 ] Bibin A Chundatt commented on YARN-8699: [~giovanni.fumarola] Attached patch handling typo fix too. > Add Yarnclient#yarnclusterMetrics API implementation in router > -- > > Key: YARN-8699 > URL: https://issues.apache.org/jira/browse/YARN-8699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8699.001.patch, YARN-8699.002.patch, > YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch > > > Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606590#comment-16606590 ] Hadoop QA commented on YARN-8658: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 51s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 52s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 30s{color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 21s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 29s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 48s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 26s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 24s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 77m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8658 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938734/YARN-8658.04.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1b8ad07e56a9 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (YARN-5597) YARN Federation improvements
[ https://issues.apache.org/jira/browse/YARN-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606535#comment-16606535 ] Íñigo Goiri commented on YARN-5597: --- We are currently testing federation using ZK for both the FederationStateStore and the RMStateStore. We use the same ZK ensemble and connection string so we are not having issues here. However, we haven't tested it with Kerberos yet. > YARN Federation improvements > > > Key: YARN-5597 > URL: https://issues.apache.org/jira/browse/YARN-5597 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Major > > This umbrella JIRA tracks set of improvements over the YARN Federation MVP > (YARN-2915) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Young Chen updated YARN-8658: - Attachment: YARN-8658.04.patch > Metrics for AMRMClientRelayer inside FederationInterceptor > -- > > Key: YARN-8658 > URL: https://issues.apache.org/jira/browse/YARN-8658 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Young Chen >Priority: Major > Attachments: YARN-8658.01.patch, YARN-8658.02.patch, > YARN-8658.03.patch, YARN-8658.04.patch > > > AMRMClientRelayer (YARN-7900) is introduced for stateful > FederationInterceptor (YARN-7899), to keep track of all pending requests sent > to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to > show the state of things in FederationInterceptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606524#comment-16606524 ] Hadoop QA commented on YARN-8751: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 27m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 0s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 54s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 98m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8751 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938719/YARN-8751.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f1b1de2e20ea 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / eca1a4b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21780/testReport/ | | Max. process+thread count | 407 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21780/console | | Powered by | Apache Yetus 0.8.0
[jira] [Commented] (YARN-7794) SLSRunner is not loading timeline service jars causing failure
[ https://issues.apache.org/jira/browse/YARN-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606480#comment-16606480 ] Yufei Gu commented on YARN-7794: [~jhung], the patch looks good to me. > SLSRunner is not loading timeline service jars causing failure > -- > > Key: YARN-7794 > URL: https://issues.apache.org/jira/browse/YARN-7794 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.1.0 >Reporter: Sunil Govindan >Assignee: Yufei Gu >Priority: Blocker > Fix For: 3.1.0 > > Attachments: YARN-7794-branch-2.001.patch, YARN-7794.001.patch > > > {code:java} > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 13 more > Exception in thread "pool-2-thread-390" java.lang.NoClassDefFoundError: > org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:443) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641){code} > We are getting this error while running SLS. new patch of timelineservice > under share/hadoop/yarn is not loaded in SLS jvm (verified from slsrunner > classpath) > cc/ [~rohithsharma] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8045) Reduce log output from container status calls
[ https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606463#comment-16606463 ] Shane Kumpf commented on YARN-8045: --- Good call, that sounds good to me. > Reduce log output from container status calls > - > > Key: YARN-8045 > URL: https://issues.apache.org/jira/browse/YARN-8045 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shane Kumpf >Priority: Major > > Each time a container's status is returned a log entry is produced in the NM > from {{ContainerManagerImpl}}. The container status includes the diagnostics > field for the container. If the diagnostics field contains an exception, it > can appear as if the exception is logged repeatedly every second. The > diagnostics message can also span many lines, which puts pressure on the logs > and makes it harder to read. > For example: > {code} > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_e01_1521323860653_0001_01_05 > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: > RUNNING, Capability: , Diagnostics: [2018-03-17 > 22:01:00.675]Exception from container-launch. > Container id: container_e01_1521323860653_0001_01_05 > Exit code: -1 > Exception message: > Shell ouput: > [2018-03-17 22:01:00.750]Diagnostic message from attempt : > [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1. > , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8045) Reduce log output from container status calls
[ https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606410#comment-16606410 ] Craig Condit commented on YARN-8045: [~shaneku...@gmail.com], as a compromise, I think we can maintain compatibility by adding a bit of logic to the logging – still log message at INFO, but replace diagnostic content with '...' if DEBUG logging is not enabled. This shouldn't trip up parsers and would still give administrators the ability to turn it on if necessary. > Reduce log output from container status calls > - > > Key: YARN-8045 > URL: https://issues.apache.org/jira/browse/YARN-8045 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shane Kumpf >Priority: Major > > Each time a container's status is returned a log entry is produced in the NM > from {{ContainerManagerImpl}}. The container status includes the diagnostics > field for the container. If the diagnostics field contains an exception, it > can appear as if the exception is logged repeatedly every second. The > diagnostics message can also span many lines, which puts pressure on the logs > and makes it harder to read. > For example: > {code} > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_e01_1521323860653_0001_01_05 > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: > RUNNING, Capability: , Diagnostics: [2018-03-17 > 22:01:00.675]Exception from container-launch. > Container id: container_e01_1521323860653_0001_01_05 > Exit code: -1 > Exception message: > Shell ouput: > [2018-03-17 22:01:00.750]Diagnostic message from attempt : > [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1. > , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8045) Reduce log output from container status calls
[ https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606394#comment-16606394 ] Shane Kumpf commented on YARN-8045: --- Thanks for the proposal [~ccondit-target]. Moving the meat of the diagnostics field to DEBUG makes sense to me and would meet the requirement with minimal change. My one concern is how that might impact compatibility. HADOOP-13714 recently updated the [compatibility guide|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md#log-output], which includes logs. Given that logs are considered Unstable, I think we are safe, but there is a note about ensuring existing parsers don't break. Can we consider the parser requirement in moving this entry to DEBUG? > Reduce log output from container status calls > - > > Key: YARN-8045 > URL: https://issues.apache.org/jira/browse/YARN-8045 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shane Kumpf >Priority: Major > > Each time a container's status is returned a log entry is produced in the NM > from {{ContainerManagerImpl}}. The container status includes the diagnostics > field for the container. If the diagnostics field contains an exception, it > can appear as if the exception is logged repeatedly every second. The > diagnostics message can also span many lines, which puts pressure on the logs > and makes it harder to read. > For example: > {code} > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_e01_1521323860653_0001_01_05 > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: > RUNNING, Capability: , Diagnostics: [2018-03-17 > 22:01:00.675]Exception from container-launch. > Container id: container_e01_1521323860653_0001_01_05 > Exit code: -1 > Exception message: > Shell ouput: > [2018-03-17 22:01:00.750]Diagnostic message from attempt : > [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1. > , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8666) [UI2] Remove application tab from Yarn Queue Page
[ https://issues.apache.org/jira/browse/YARN-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606329#comment-16606329 ] Yesha Vora commented on YARN-8666: -- Patch updated to remove "Applications" from Queue page. The screenshot after removing "Applications" is attached. > [UI2] Remove application tab from Yarn Queue Page > - > > Key: YARN-8666 > URL: https://issues.apache.org/jira/browse/YARN-8666 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-08-14 at 3.43.18 PM.png, Screen Shot > 2018-09-06 at 12.50.14 PM.png, YARN-8666.001.patch > > > Yarn UI2 Queue page puts Application button. This button does not redirect to > any other page. In addition to that running application table is also > available on same page. > Thus, there is no need to have a button for application in Queue page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8666) [UI2] Remove application tab from Yarn Queue Page
[ https://issues.apache.org/jira/browse/YARN-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8666: - Attachment: YARN-8666.001.patch > [UI2] Remove application tab from Yarn Queue Page > - > > Key: YARN-8666 > URL: https://issues.apache.org/jira/browse/YARN-8666 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-08-14 at 3.43.18 PM.png, Screen Shot > 2018-09-06 at 12.50.14 PM.png, YARN-8666.001.patch > > > Yarn UI2 Queue page puts Application button. This button does not redirect to > any other page. In addition to that running application table is also > available on same page. > Thus, there is no need to have a button for application in Queue page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8666) [UI2] Remove application tab from Yarn Queue Page
[ https://issues.apache.org/jira/browse/YARN-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8666: - Attachment: Screen Shot 2018-09-06 at 12.50.14 PM.png > [UI2] Remove application tab from Yarn Queue Page > - > > Key: YARN-8666 > URL: https://issues.apache.org/jira/browse/YARN-8666 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-08-14 at 3.43.18 PM.png, Screen Shot > 2018-09-06 at 12.50.14 PM.png, YARN-8666.001.patch > > > Yarn UI2 Queue page puts Application button. This button does not redirect to > any other page. In addition to that running application table is also > available on same page. > Thus, there is no need to have a button for application in Queue page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-3879) [Storage implementation] Create HDFS backing storage implementation for ATS reads
[ https://issues.apache.org/jira/browse/YARN-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606294#comment-16606294 ] Vrushali C edited comment on YARN-3879 at 9/6/18 7:27 PM: -- Thanks [~abmodi] ! Patch looks good overall. If you are going to update it, then I would suggest using File.separator instead of an actual "/" . If the patch does not need any updating, then let's leave it. Overall +1 on the patch was (Author: vrushalic): Patch looks good overall. If you are going to update it, then I would suggest using File.separator instead of an actual "/" . If the patch does not need any updating, then let's leave it. Overall +1 on the patch > [Storage implementation] Create HDFS backing storage implementation for ATS > reads > - > > Key: YARN-3879 > URL: https://issues.apache.org/jira/browse/YARN-3879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Tsuyoshi Ozawa >Assignee: Abhishek Modi >Priority: Major > Labels: YARN-5355 > Attachments: YARN-3879-YARN-7055.001.patch, YARN-3879.001.patch, > YARN-3879.002.patch, YARN-3879.003.patch, YARN-3879.004.patch > > > Reader version of YARN-3841 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3879) [Storage implementation] Create HDFS backing storage implementation for ATS reads
[ https://issues.apache.org/jira/browse/YARN-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606294#comment-16606294 ] Vrushali C commented on YARN-3879: -- Patch looks good overall. If you are going to update it, then I would suggest using File.separator instead of an actual "/" . If the patch does not need any updating, then let's leave it. Overall +1 on the patch > [Storage implementation] Create HDFS backing storage implementation for ATS > reads > - > > Key: YARN-3879 > URL: https://issues.apache.org/jira/browse/YARN-3879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Tsuyoshi Ozawa >Assignee: Abhishek Modi >Priority: Major > Labels: YARN-5355 > Attachments: YARN-3879-YARN-7055.001.patch, YARN-3879.001.patch, > YARN-3879.002.patch, YARN-3879.003.patch, YARN-3879.004.patch > > > Reader version of YARN-3841 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-3841) [Storage implementation] Adding retry semantics to HDFS backing storage
[ https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606291#comment-16606291 ] Vrushali C edited comment on YARN-3841 at 9/6/18 7:25 PM: -- Thanks [~abmodi] ! Patch looks good overall. A couple of minor comments: - let's use File.separator instead of an actual "/" - For FileSystemTimelineWriterImpl.java, I think we may not want to do a fs.close(); at line 261. This will close the FileSystem handle for all threads in that process since this is a static instance. - For line281, instead of {{ LOG.info("Retrying operation on FS. Retry no. " + retry); }} we could perhaps update it to {{ "Will retry operation on FS. Retry no. " + retry + " after sleeping for " + fsRetryInterval + " seconds" ); }} Will be a better indication of the sleep & retry. What do you think? was (Author: vrushalic): Thanks [~abmodi] ! Patch looks good overall. A couple of minor comments: - let's use File.separator instead of an actual "/" - For FileSystemTimelineWriterImpl.java, I think we may not want to do a fs.close(); at line 261. This will close the FileSystem handle for all threads in that process since this is a static instance. - For line281, instead of {{monospaced}} LOG.info("Retrying operation on FS. Retry no. " + retry); {{monospaced}} we could perhaps update it to {{monospaced}} "Will retry operation on FS. Retry no. " + retry + " after sleeping for " + fsRetryInterval + " seconds" ); {{monospaced}} Will be a better indication of the sleep & retry. What do you think? > [Storage implementation] Adding retry semantics to HDFS backing storage > --- > > Key: YARN-3841 > URL: https://issues.apache.org/jira/browse/YARN-3841 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Tsuyoshi Ozawa >Assignee: Abhishek Modi >Priority: Major > Labels: YARN-5355 > Attachments: YARN-3841-YARN-7055.002.patch, YARN-3841.001.patch, > YARN-3841.002.patch, YARN-3841.003.patch, YARN-3841.004.patch > > > HDFS backing storage is useful for following scenarios. > 1. For Hadoop clusters which don't run HBase. > 2. For fallback from HBase when HBase cluster is temporary unavailable. > Quoting ATS design document of YARN-2928: > {quote} > In the case the HBase > storage is not available, the plugin should buffer the writes temporarily > (e.g. HDFS), and flush > them once the storage comes back online. Reading and writing to hdfs as the > the backup storage > could potentially use the HDFS writer plugin unless the complexity of > generalizing the HDFS > writer plugin for this purpose exceeds the benefits of reusing it here. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-3841) [Storage implementation] Adding retry semantics to HDFS backing storage
[ https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606291#comment-16606291 ] Vrushali C edited comment on YARN-3841 at 9/6/18 7:25 PM: -- Thanks [~abmodi] ! Patch looks good overall. A couple of minor comments: - let's use File.separator instead of an actual "/" - For FileSystemTimelineWriterImpl.java, I think we may not want to do a fs.close(); at line 261. This will close the FileSystem handle for all threads in that process since this is a static instance. - For line281, instead of {{LOG.info("Retrying operation on FS. Retry no. " + retry);}} we could perhaps update it to {{"Will retry operation on FS. Retry no. " + retry + " after sleeping for " + fsRetryInterval + " seconds" );}} Will be a better indication of the sleep & retry. What do you think? was (Author: vrushalic): Thanks [~abmodi] ! Patch looks good overall. A couple of minor comments: - let's use File.separator instead of an actual "/" - For FileSystemTimelineWriterImpl.java, I think we may not want to do a fs.close(); at line 261. This will close the FileSystem handle for all threads in that process since this is a static instance. - For line281, instead of {{ LOG.info("Retrying operation on FS. Retry no. " + retry); }} we could perhaps update it to {{ "Will retry operation on FS. Retry no. " + retry + " after sleeping for " + fsRetryInterval + " seconds" ); }} Will be a better indication of the sleep & retry. What do you think? > [Storage implementation] Adding retry semantics to HDFS backing storage > --- > > Key: YARN-3841 > URL: https://issues.apache.org/jira/browse/YARN-3841 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Tsuyoshi Ozawa >Assignee: Abhishek Modi >Priority: Major > Labels: YARN-5355 > Attachments: YARN-3841-YARN-7055.002.patch, YARN-3841.001.patch, > YARN-3841.002.patch, YARN-3841.003.patch, YARN-3841.004.patch > > > HDFS backing storage is useful for following scenarios. > 1. For Hadoop clusters which don't run HBase. > 2. For fallback from HBase when HBase cluster is temporary unavailable. > Quoting ATS design document of YARN-2928: > {quote} > In the case the HBase > storage is not available, the plugin should buffer the writes temporarily > (e.g. HDFS), and flush > them once the storage comes back online. Reading and writing to hdfs as the > the backup storage > could potentially use the HDFS writer plugin unless the complexity of > generalizing the HDFS > writer plugin for this purpose exceeds the benefits of reusing it here. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3841) [Storage implementation] Adding retry semantics to HDFS backing storage
[ https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606291#comment-16606291 ] Vrushali C commented on YARN-3841: -- Thanks [~abmodi] ! Patch looks good overall. A couple of minor comments: - let's use File.separator instead of an actual "/" - For FileSystemTimelineWriterImpl.java, I think we may not want to do a fs.close(); at line 261. This will close the FileSystem handle for all threads in that process since this is a static instance. - For line281, instead of {{monospaced}} LOG.info("Retrying operation on FS. Retry no. " + retry); {{monospaced}} we could perhaps update it to {{monospaced}} "Will retry operation on FS. Retry no. " + retry + " after sleeping for " + fsRetryInterval + " seconds" ); {{monospaced}} Will be a better indication of the sleep & retry. What do you think? > [Storage implementation] Adding retry semantics to HDFS backing storage > --- > > Key: YARN-3841 > URL: https://issues.apache.org/jira/browse/YARN-3841 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Tsuyoshi Ozawa >Assignee: Abhishek Modi >Priority: Major > Labels: YARN-5355 > Attachments: YARN-3841-YARN-7055.002.patch, YARN-3841.001.patch, > YARN-3841.002.patch, YARN-3841.003.patch, YARN-3841.004.patch > > > HDFS backing storage is useful for following scenarios. > 1. For Hadoop clusters which don't run HBase. > 2. For fallback from HBase when HBase cluster is temporary unavailable. > Quoting ATS design document of YARN-2928: > {quote} > In the case the HBase > storage is not available, the plugin should buffer the writes temporarily > (e.g. HDFS), and flush > them once the storage comes back online. Reading and writing to hdfs as the > the backup storage > could potentially use the HDFS writer plugin unless the complexity of > generalizing the HDFS > writer plugin for this purpose exceeds the benefits of reusing it here. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606270#comment-16606270 ] Craig Condit edited comment on YARN-8751 at 9/6/18 7:02 PM: [~shaneku...@gmail.com], looks like we have consensus on the approach. I can take this one. was (Author: ccondit-target): [~shaneku...@gmail.com], looks like have consensus on the approach. I can take this one. > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Critical > Labels: Docker > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Path > /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 > has permission 774 but needs per > mission 750. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor >
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606270#comment-16606270 ] Craig Condit commented on YARN-8751: [~shaneku...@gmail.com], looks like have consensus on the approach. I can take this one. > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Critical > Labels: Docker > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Path > /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 > has permission 774 but needs per > mission 750. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null) > 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch > (ContainerRelaunch.java:call(129)) -
[jira] [Assigned] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit reassigned YARN-8751: -- Assignee: Craig Condit > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Craig Condit >Priority: Critical > Labels: Docker > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Path > /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 > has permission 774 but needs per > mission 750. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null) > 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch > (ContainerRelaunch.java:call(129)) - Failed to launch container due to > configuration error. >
[jira] [Commented] (YARN-8718) Merge related work for YARN-3409
[ https://issues.apache.org/jira/browse/YARN-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606260#comment-16606260 ] Hadoop QA commented on YARN-8718: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 1s{color} | {color:green} The patch appears to include 31 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 0s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 16m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 15m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 26s{color} | {color:green} root generated 0 new + 1453 unchanged - 1 fixed = 1453 total (was 1454) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 53s{color} | {color:orange} root: The patch generated 14 new + 1590 unchanged - 54 fixed = 1604 total (was 1644) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 28s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 14s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 2 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 12m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 18s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 48s{color} | {color:green} hadoop-common
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606189#comment-16606189 ] Shane Kumpf commented on YARN-8751: --- Thanks for the feedback and suggestions everyone. I think the issue is most likely to happen under relaunch conditions with a poorly behaving container (as noted by [~eyang]). Relaunch (afaik) is only used by YARN Services today, so the impact may be isolated. Having said that, based on the conversation here, it does appear there are other non-fatal cases that could trigger these errors, so I'm +1 on the proposal from [~jlowe] affecting both launch and relaunch. > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Critical > Labels: Docker > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Path >
[jira] [Updated] (YARN-8200) Backport resource types/GPU features to branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-8200: Attachment: YARN-8200-branch-2.001.patch > Backport resource types/GPU features to branch-2 > > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-8200-branch-2.001.patch, > counter.scheduler.operation.allocate.csv.defaultResources, > counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json > > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues we uncovered, and the backport was fairly smooth. > We also did a trial backport of most of YARN-6223 (sans docker support). > > Regarding the backports, perhaps we can do the development in a feature > branch and then merge to branch-2 when ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606091#comment-16606091 ] Jonathan Hung commented on YARN-8200: - Rebased YARN-8200 on branch-2. Attached the full diff between branch-2 and YARN-8200 (001) > Backport resource types/GPU features to branch-2 > > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-8200-branch-2.001.patch, > counter.scheduler.operation.allocate.csv.defaultResources, > counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json > > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues we uncovered, and the backport was fairly smooth. > We also did a trial backport of most of YARN-6223 (sans docker support). > > Regarding the backports, perhaps we can do the development in a feature > branch and then merge to branch-2 when ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606089#comment-16606089 ] Eric Yang commented on YARN-8751: - +1 with [~jlowe]'s proposal that only INVALID_CONTAINER_EXEC_PERMISSIONS and INVALID_CONFIG_FILE throws ConfigurationException. The other exit code are non-fatal and need to be kept at best effort of retries even when system is running with unfavorable conditions. > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Critical > Labels: Docker > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Path > /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 > has permission 774 but needs per > mission 750. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor >
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606007#comment-16606007 ] Craig Condit commented on YARN-8751: Each of these error codes could have any number of root causes ranging from transient to task-specific, disk-specific, node-specific, or cluster-level. Trying to do root cause analysis of OS-level failures in code isn't really practical. No two environments are alike and it's going to be very difficult to set a policy which makes sense for all clusters. This is where things like admin-provided health check scripts come into play. These can check things like disks available, disks non-full, permissions (at top level dirs) set correctly, etc. That said, I think we should have defaults which cause the least amount of pain in the majority of cases. It seems to me that in most cases, it's far more likely to be a transient or per-disk issue causing these failures than a global misconfiguration, so not failing the NM makes sense. As a way to address detection of the specific issue mentioned in this JIRA, top-level permissions on NM-controlled dirs could be validated on startup (if they aren't already) and cause a NM failure at that point (or at least consider the specific disk bad). This would cause fail-fast behavior for something that is clearly configured wrong globally. it would also make these issues occuring at a container level far more likely to be transient or task/app-specific. > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Critical > Labels: Docker > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor >
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606003#comment-16606003 ] Eric Yang commented on YARN-8751: - [~shaneku...@gmail.com] I believe the COULD_NOT_CREATE_WORK_DIRECTORIES exit code needs to happen on all disks before the option is exhausted. Introduction of relaunch may single out a single working directory, and report a false positive response while the system may have option to fall back to create new working directory on other disks to move forward. I am not sure if the test system has more than one local disks. If it only had one disk, it may appear this single container crashes the node manager. If relaunch doesn't retry other disks, then it is a bug to change container-executor logic to detect such case and create working directory on other disks. This is similar to fault tolerance design in HDFS, relaunch is best effort to reuse the same working directory, but use other data directory, if the current one has turned bad. Let's look at the problem from a different angles, the container is doing destructive operation to working directory and knock out all disks by abusing relaunch. This looks more like a deliberate attempt to sabotage the system. In this case, it is really system administrator's responsibility to disallow such badly behaved user/image to grant them privileged container. This is same as saying, don't hand them a chainsaw, if you know they are irresponsible individuals. There is little that can be done to protect irresponsible individuals from themselves. You can only protect them by not giving them too much power. Disable write mount for privileged container is the wrong option because there are real program that can run multi-users container that depends on privileged container feature. If the badly behaved program is a QA test, then we may need to hand wave that we hand you a chainsaw, read the instructions and be careful with it. > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Critical > Labels: Docker > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor >
[jira] [Commented] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate
[ https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605998#comment-16605998 ] Pradeep Ambati commented on YARN-8680: -- Thanks for the review! [~jlowe] I have addressed all the issues you raised in the latest patch review. > YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate > - > > Key: YARN-8680 > URL: https://issues.apache.org/jira/browse/YARN-8680 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Pradeep Ambati >Assignee: Pradeep Ambati >Priority: Critical > Attachments: YARN-8680.00.patch, YARN-8680.01.patch, > YARN-8680.02.patch, YARN-8680.03.patch > > > Similar to YARN-8242, implement iterable abstraction for > LocalResourceTrackerState to load completed and in progress resources when > needed rather than loading them all at a time for a respective state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate
[ https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605993#comment-16605993 ] Hadoop QA commented on YARN-8680: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 237 unchanged - 5 fixed = 237 total (was 242) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 40s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8680 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938657/YARN-8680.03.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 15efbab9fa02 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b6c543f | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21778/testReport/ | | Max. process+thread count | 468 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21778/console | | Powered by | Apache Yetus
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605969#comment-16605969 ] Eric Badger commented on YARN-8751: --- I agree that we shouldn't kill the NM because of something like bad permissions that only affects a single job. If that is possible, then a user could pretty easily bring down the entire cluster, which is double plus ungood. However, it would also be nice to still be able to mark the node bad in cases where things are really wrong and will affect all jobs. Just thinking out loud here, but if all of the disks are 100% full, the NM is going to fail every container that runs on it. Yes, NM blacklisting will help, but that has to be re-learned for each application (afaik). It would be nice to detect if the error is actually fatal to all jobs or not. And I'm not sure that's an easy thing to do when it comes to creating directories. Maybe someone else has an idea? > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Critical > Labels: Docker > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541))
[jira] [Updated] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8751: -- Labels: Docker (was: ) > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Critical > Labels: Docker > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Path > /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 > has permission 774 but needs per > mission 750. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null) > 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch > (ContainerRelaunch.java:call(129)) - Failed to launch container due to > configuration error. >
[jira] [Commented] (YARN-8045) Reduce log output from container status calls
[ https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605901#comment-16605901 ] Craig Condit commented on YARN-8045: Might make the code (slightly) more complex, but we could output diagnostics only at DEBUG level and the rest of the message at INFO. > Reduce log output from container status calls > - > > Key: YARN-8045 > URL: https://issues.apache.org/jira/browse/YARN-8045 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Shane Kumpf >Priority: Major > > Each time a container's status is returned a log entry is produced in the NM > from {{ContainerManagerImpl}}. The container status includes the diagnostics > field for the container. If the diagnostics field contains an exception, it > can appear as if the exception is logged repeatedly every second. The > diagnostics message can also span many lines, which puts pressure on the logs > and makes it harder to read. > For example: > {code} > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_e01_1521323860653_0001_01_05 > 2018-03-17 22:01:11,632 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: > RUNNING, Capability: , Diagnostics: [2018-03-17 > 22:01:00.675]Exception from container-launch. > Container id: container_e01_1521323860653_0001_01_05 > Exit code: -1 > Exception message: > Shell ouput: > [2018-03-17 22:01:00.750]Diagnostic message from attempt : > [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1. > , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605895#comment-16605895 ] Craig Condit commented on YARN-8751: {quote}[~jlowe] : So my vote is keep INVALID_CONTAINER_EXEC_PERMISSIONS and INVALID_CONFIG_FILE fatal but the others should only fail the single container launch rather than the whole NM process. {quote} Agreed. The remainder of the exit codes could be caused by any number of things, such as disk failure, which you point out. Even if the problem were to be caused by something more systemic, NM blacklisting should kick in pretty quickly as tasks fail. +1 on making this non-fatal. Additionally, we may want to consider updating the diagnostic message returned in the following {{else}} clause to contain the exit code enum name as well as the number – this would seem to make diagnosing problems much easier for both users and administrators. > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Critical > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365
[jira] [Updated] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate
[ https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Ambati updated YARN-8680: - Attachment: YARN-8680.03.patch > YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate > - > > Key: YARN-8680 > URL: https://issues.apache.org/jira/browse/YARN-8680 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Pradeep Ambati >Assignee: Pradeep Ambati >Priority: Critical > Attachments: YARN-8680.00.patch, YARN-8680.01.patch, > YARN-8680.02.patch, YARN-8680.03.patch > > > Similar to YARN-8242, implement iterable abstraction for > LocalResourceTrackerState to load completed and in progress resources when > needed rather than loading them all at a time for a respective state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8751: - Priority: Critical (was: Major) > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Critical > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script paths... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating local dirs... > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Path > /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 > has permission 774 but needs per > mission 750. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null) > 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch > (ContainerRelaunch.java:call(129)) - Failed to launch container due to > configuration error. > org.apache.hadoop.yarn.exceptions.ConfigurationException:
[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605803#comment-16605803 ] Jason Lowe commented on YARN-8751: -- A bad container executor or config file is pretty catastrophic since the NM can't control anything at that point, including the inability to even cleanup containers when it shuts down. However the other errors are specific to setting up an individual container and should not bring down the NM. If a disk goes bad and the container executor can't create one of the directories then this should not be a fatal error to the NM, just a fatal error to that container launch. Otherwise a single disk failure can bring down the NM if the container executor discovers it before the NM disk checker does. So my vote is keep INVALID_CONTAINER_EXEC_PERMISSIONS and INVALID_CONFIG_FILE fatal but the others should only fail the single container launch rather than the whole NM process. > Container-executor permission check errors cause the NM to be marked unhealthy > -- > > Key: YARN-8751 > URL: https://issues.apache.org/jira/browse/YARN-8751 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Major > > {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a > NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by > {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception > occurs based on the exit code returned by container-executor, and 7 different > exit codes cause the NM to be marked UNHEALTHY. > {code:java} > if (exitCode == > ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || > exitCode == > ExitCode.INVALID_CONFIG_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || > exitCode == > ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { > throw new ConfigurationException( > "Linux Container Executor reached unrecoverable exception", e);{code} > I can understand why these are treated as fatal with the existing process > container model. However, with privileged Docker containers this may be too > harsh, as Privileged Docker containers don't guarantee the user's identity > will be propagated into the container, so these mismatches can occur. Outside > of privileged containers, an application may inadvertently change the > permissions on one of these directories, triggering this condition. > In our case, a container changed the "appcache//" > directory permissions to 774. Some time later, the process in the container > died and the Retry Policy kicked in to RELAUNCH the container. When the > RELAUNCH occurred, container-executor checked the permissions of the > "appcache//" directory (the existing workdir is retained > for RELAUNCH) and returned exit code 35. Exit code 35 is > COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all > containers running on that node, when really only this container would have > been impacted. > {code:java} > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e15_1535130383425_0085_01_05 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 35 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch > container failed > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not > create container dirsCould not create local files and directories 5 6 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is user > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn > 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Creating script
[jira] [Updated] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader
[ https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushil Ks updated YARN-8270: Description: This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and Timeline Reader, basically for Timeline Collector it tries to capture success and failure latencies for *putEntities* and *putEntitiesAsync* from *TimelineCollectorWebService* , similarly all the API's success and failure latencies for fetching TimelineEntities from *TimelineReaderWebServices*. This would actually help in monitoring and measuring performance for ATSv2 at scale. (was: This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and Timeline Reader, basically for Timeline Collector it tries to capture success, failure and latencies for *putEntities* and *putEntitiesAsync* from *TimelineCollectorWebService* and all the API's success, failure and latencies for fetching TimelineEntities from *TimelineReaderWebServices*. This would actually help in monitoring and measuring performance for ATSv2 at scale.) > Adding JMX Metrics for Timeline Collector and Reader > > > Key: YARN-8270 > URL: https://issues.apache.org/jira/browse/YARN-8270 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineserver >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Attachments: YARN-8270.001.patch, YARN-8270.002.patch > > > This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and > Timeline Reader, basically for Timeline Collector it tries to capture success > and failure latencies for *putEntities* and *putEntitiesAsync* from > *TimelineCollectorWebService* , similarly all the API's success and failure > latencies for fetching TimelineEntities from *TimelineReaderWebServices*. > This would actually help in monitoring and measuring performance for ATSv2 at > scale. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
[ https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-8751: -- Description: {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception occurs based on the exit code returned by container-executor, and 7 different exit codes cause the NM to be marked UNHEALTHY. {code:java} if (exitCode == ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || exitCode == ExitCode.INVALID_CONFIG_FILE.getExitCode() || exitCode == ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || exitCode == ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || exitCode == ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || exitCode == ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || exitCode == ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { throw new ConfigurationException( "Linux Container Executor reached unrecoverable exception", e);{code} I can understand why these are treated as fatal with the existing process container model. However, with privileged Docker containers this may be too harsh, as Privileged Docker containers don't guarantee the user's identity will be propagated into the container, so these mismatches can occur. Outside of privileged containers, an application may inadvertently change the permissions on one of these directories, triggering this condition. In our case, a container changed the "appcache//" directory permissions to 774. Some time later, the process in the container died and the Retry Policy kicked in to RELAUNCH the container. When the RELAUNCH occurred, container-executor checked the permissions of the "appcache//" directory (the existing workdir is retained for RELAUNCH) and returned exit code 35. Exit code 35 is COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all containers running on that node, when really only this container would have been impacted. {code:java} 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Container id: container_e15_1535130383425_0085_01_05 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exit code: 35 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch container failed 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not create container dirsCould not create local files and directories 5 6 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Shell output: main : command provided 4 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - main : run as user is user 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Creating script paths... 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Creating local dirs... 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Path /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 has permission 774 but needs per mission 750. 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null) 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch (ContainerRelaunch.java:call(129)) - Failed to launch container due to configuration error. org.apache.hadoop.yarn.exceptions.ConfigurationException: Linux Container Executor reached unrecoverable exception at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleExitCode(LinuxContainerExecutor.java:633) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:573) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) at
[jira] [Created] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy
Shane Kumpf created YARN-8751: - Summary: Container-executor permission check errors cause the NM to be marked unhealthy Key: YARN-8751 URL: https://issues.apache.org/jira/browse/YARN-8751 Project: Hadoop YARN Issue Type: Bug Reporter: Shane Kumpf {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception occurs based on the exit code returned by container-executor, and 7 different exit codes cause the NM to be marked UNHEALTHY. {code:java} if (exitCode == ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() || exitCode == ExitCode.INVALID_CONFIG_FILE.getExitCode() || exitCode == ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() || exitCode == ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() || exitCode == ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() || exitCode == ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() || exitCode == ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) { throw new ConfigurationException( "Linux Container Executor reached unrecoverable exception", e);{code} I can understand why these are treated as fatal with the existing process container model. However, with privileged Docker containers this may be too harsh, as Privileged Docker containers don't guarantee the user's identity will be propagated into the container. In our case, a privileged container changed the "appcache//" directory permissions to 774. Some time later, the process in the container died and the Retry Policy kicked in to RELAUNCH the container. When the RELAUNCH occurred, container-executor checked the permissions of the "appcache//" directory (the existing workdir is retained for RELAUNCH) and returned exit code 35. Exit code 35 is COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all containers running on that node, when really only this container would have been impacted. {code:java} 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Container id: container_e15_1535130383425_0085_01_05 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exit code: 35 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch container failed 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not create container dirsCould not create local files and directories 5 6 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Shell output: main : command provided 4 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - main : run as user is user 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Creating script paths... 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Creating local dirs... 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Path /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05 has permission 774 but needs per mission 750. 2018-08-31 21:07:22,365 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null) 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch (ContainerRelaunch.java:call(129)) - Failed to launch container due to configuration error. org.apache.hadoop.yarn.exceptions.ConfigurationException: Linux Container Executor reached unrecoverable exception at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleExitCode(LinuxContainerExecutor.java:633) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:573) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
[jira] [Created] (YARN-8750) Refactor TestQueueMetrics
Szilard Nemeth created YARN-8750: Summary: Refactor TestQueueMetrics Key: YARN-8750 URL: https://issues.apache.org/jira/browse/YARN-8750 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Szilard Nemeth Assignee: Szilard Nemeth {{TestQueueMetrics#checkApps}} and {{TestQueueMetrics#checkResources}} have 8 and 14 parameters, respectively. It is very hard to read the testcases that are using these methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605721#comment-16605721 ] Hadoop QA commented on YARN-8258: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-8258 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8258 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21777/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > YARN webappcontext for UI2 should inherit all filters from default context > -- > > Key: YARN-8258 > URL: https://issues.apache.org/jira/browse/YARN-8258 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Sumana Sathish >Assignee: Sunil Govindan >Priority: Major > Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, > YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, > YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, > YARN-8258.007.patch, YARN-8258.008.patch > > > Thanks [~ssath...@hortonworks.com] for finding this. > Ideally all filters from default context has to be inherited to UI2 context > as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8745) Misplaced the TestRMWebServicesFairScheduler.java file.
[ https://issues.apache.org/jira/browse/YARN-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605702#comment-16605702 ] Hadoop QA commented on YARN-8745: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 11 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 56s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}120m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8745 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938618/YARN-8745.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8e3f375236df 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 962089a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21774/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21774/testReport/ | | Max. process+thread count | 936 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605683#comment-16605683 ] ASF GitHub Bot commented on YARN-8747: -- Github user collinmazb closed the pull request at: https://github.com/apache/hadoop/pull/411 > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: collinma >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605673#comment-16605673 ] Hadoop QA commented on YARN-8747: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 35m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 51m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8747 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938635/YARN-8747.001.patch | | Optional Tests | dupname asflicense shadedclient | | uname | Linux 31b3eda5335f 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 962089a | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 301 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21775/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: collinma >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8746) ui2 overview doesn't display GPU usage info when using Fairscheduler
[ https://issues.apache.org/jira/browse/YARN-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] collinma reassigned YARN-8746: -- Assignee: collinma > ui2 overview doesn't display GPU usage info when using Fairscheduler > - > > Key: YARN-8746 > URL: https://issues.apache.org/jira/browse/YARN-8746 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: collinma >Priority: Blocker > Labels: GPU, fairscheduler, yarn > Original Estimate: 1h > Remaining Estimate: 1h > > When using fair scheduler, GPU related information isn't displayed because > the "metrics" api doesn't return any GPU related usage information( has run > yarn on GPU per [this > |[https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/UsingGpus.html]).] > The hadoop version is 3.1.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605631#comment-16605631 ] collinma commented on YARN-8747: Thanks [~sunilg]. I've re-sent a PR([https://github.com/apache/hadoop/pull/412).] Thanks for your work, very appriciate it! > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: collinma >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605630#comment-16605630 ] ASF GitHub Bot commented on YARN-8747: -- GitHub user collinmazb opened a pull request: https://github.com/apache/hadoop/pull/412 YARN-8747: update moment-timezone version to 0.5.1 re-sent a PR per https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel You can merge this pull request into a Git repository by running: $ git pull https://github.com/collinmazb/hadoop trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/412.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #412 commit b547969d0446ad3a9fb1aa9038baaa091f4fc225 Author: collinma Date: 2018-09-06T10:54:19Z YARN-8747: update moment-timezone version to 0.5.1 > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: collinma >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8726) [UI2] YARN UI2 is not accessible when config.env file failed to load
[ https://issues.apache.org/jira/browse/YARN-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605601#comment-16605601 ] Sunil Govindan commented on YARN-8726: -- This looks good to me. Will commit shortly if no objections > [UI2] YARN UI2 is not accessible when config.env file failed to load > > > Key: YARN-8726 > URL: https://issues.apache.org/jira/browse/YARN-8726 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Akhil PB >Assignee: Akhil PB >Priority: Critical > Attachments: YARN-8726.001.patch > > > It is observed that yarn UI2 is not accessible. When UI2 is inspected, it > gives below error > {code:java} > index.html:1 Refused to execute script from > 'http://ctr-e138-1518143905142-456429-01-05.hwx.site:8088/ui2/config/configs.env' > because its MIME type ('text/plain') is not executable, and strict MIME type > checking is enabled. > yarn-ui.js:219 base url: > vendor.js:1978 ReferenceError: ENV is not defined > at updateConfigs (yarn-ui.js:212) > at Object.initialize (yarn-ui.js:218) > at vendor.js:824 > at vendor.js:825 > at visit (vendor.js:3025) > at Object.visit [as default] (vendor.js:3024) > at DAG.topsort (vendor.js:750) > at Class._runInitializer (vendor.js:825) > at Class.runInitializers (vendor.js:824) > at Class._bootSync (vendor.js:823) > onerrorDefault @ vendor.js:1978 > trigger @ vendor.js:2967 > (anonymous) @ vendor.js:3006 > invoke @ vendor.js:626 > flush @ vendor.js:629 > flush @ vendor.js:619 > end @ vendor.js:642 > run @ vendor.js:648 > join @ vendor.js:648 > run.join @ vendor.js:1510 > (anonymous) @ vendor.js:1512 > fire @ vendor.js:230 > fireWith @ vendor.js:235 > ready @ vendor.js:242 > completed @ vendor.js:242 > vendor.js:823 Uncaught ReferenceError: ENV is not defined > at updateConfigs (yarn-ui.js:212) > at Object.initialize (yarn-ui.js:218) > at vendor.js:824 > at vendor.js:825 > at visit (vendor.js:3025) > at Object.visit [as default] (vendor.js:3024) > at DAG.topsort (vendor.js:750) > at Class._runInitializer (vendor.js:825) > at Class.runInitializers (vendor.js:824) > at Class._bootSync (vendor.js:823) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan reassigned YARN-8747: Assignee: collinma > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: collinma >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605599#comment-16605599 ] Sunil Govindan commented on YARN-8747: -- Thanks [~collinma] for the patch. I think ur pull request had few more commits, current patch attached here seems good to me.So i cud commit this patch instead of pull request. Or i can use the pull request to merge if u can share correct pull request against "trunk" branch with this change alone. Also adding [~collinma] as a contributor so you can assign jiras later. > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB reassigned YARN-8747: -- Assignee: (was: Akhil PB) > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605573#comment-16605573 ] Akhil PB commented on YARN-8747: Hi, the PR includes so many other changes along with moment-timezone update. I have just submitted a patch only for moment-timezone update since the bug was related to UI2. [~sunilg] could you please help to review. > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: Akhil PB >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605564#comment-16605564 ] Akhil PB edited comment on YARN-8747 at 9/6/18 9:55 AM: Attaching patch for the moment-timezone update. cc [~sunilg] was (Author: akhilpb): Attaching patch for the moment-timezone update. > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: Akhil PB >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605565#comment-16605565 ] collinma edited comment on YARN-8747 at 9/6/18 9:55 AM: hi there, I've sent a pr([https://github.com/apache/hadoop/pull/411)] which update moment-timezone version to 0.5.1. Could someone here help review it? was (Author: collinma): hi there, I've send a pr([https://github.com/apache/hadoop/pull/411)] which update moment-timezone version to 0.5.1. Could someone here help review it? > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: Akhil PB >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605565#comment-16605565 ] collinma commented on YARN-8747: hi there, I've send a pr([https://github.com/apache/hadoop/pull/411)] which update moment-timezone version to 0.5.1. Could someone here help review it? > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: Akhil PB >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB updated YARN-8747: --- Summary: [UI2] YARN UI2 page loading failed due to js error under some time zone configuration (was: ui2 page loading failed due to js error under some time zone configuration) > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: Akhil PB >Priority: Blocker > Attachments: image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8747) ui2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB reassigned YARN-8747: -- Assignee: Akhil PB > ui2 page loading failed due to js error under some time zone configuration > -- > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: Akhil PB >Priority: Blocker > Attachments: image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605557#comment-16605557 ] Hadoop QA commented on YARN-8258: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-8258 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8258 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21773/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > YARN webappcontext for UI2 should inherit all filters from default context > -- > > Key: YARN-8258 > URL: https://issues.apache.org/jira/browse/YARN-8258 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Sumana Sathish >Assignee: Sunil Govindan >Priority: Major > Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, > YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, > YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, > YARN-8258.007.patch, YARN-8258.008.patch > > > Thanks [~ssath...@hortonworks.com] for finding this. > Ideally all filters from default context has to be inherited to UI2 context > as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8745) Misplaced the TestRMWebServicesFairScheduler.java file.
[ https://issues.apache.org/jira/browse/YARN-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605512#comment-16605512 ] Y. SREENIVASULU REDDY commented on YARN-8745: - [~bibinchundatt] I have attached the patch, addressed your comments. > Misplaced the TestRMWebServicesFairScheduler.java file. > --- > > Key: YARN-8745 > URL: https://issues.apache.org/jira/browse/YARN-8745 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, test >Reporter: Y. SREENIVASULU REDDY >Assignee: Y. SREENIVASULU REDDY >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8745.001.patch, YARN-8745.002.patch > > > TestRMWebServicesFairScheduler.java file exist in > {noformat} > hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java > {noformat} > But the package structure is > {noformat} > package org.apache.hadoop.yarn.server.resourcemanager.webapp.fairscheduler; > {noformat} > so moving the file to proper package. > YARN-7451 issue triggered from this ID. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8745) Misplaced the TestRMWebServicesFairScheduler.java file.
[ https://issues.apache.org/jira/browse/YARN-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Y. SREENIVASULU REDDY updated YARN-8745: Attachment: YARN-8745.002.patch > Misplaced the TestRMWebServicesFairScheduler.java file. > --- > > Key: YARN-8745 > URL: https://issues.apache.org/jira/browse/YARN-8745 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, test >Reporter: Y. SREENIVASULU REDDY >Assignee: Y. SREENIVASULU REDDY >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8745.001.patch, YARN-8745.002.patch > > > TestRMWebServicesFairScheduler.java file exist in > {noformat} > hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java > {noformat} > But the package structure is > {noformat} > package org.apache.hadoop.yarn.server.resourcemanager.webapp.fairscheduler; > {noformat} > so moving the file to proper package. > YARN-7451 issue triggered from this ID. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605493#comment-16605493 ] Sunil Govindan commented on YARN-8258: -- kicking jenkins. > YARN webappcontext for UI2 should inherit all filters from default context > -- > > Key: YARN-8258 > URL: https://issues.apache.org/jira/browse/YARN-8258 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Sumana Sathish >Assignee: Sunil Govindan >Priority: Major > Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, > YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, > YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, > YARN-8258.007.patch, YARN-8258.008.patch > > > Thanks [~ssath...@hortonworks.com] for finding this. > Ideally all filters from default context has to be inherited to UI2 context > as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605492#comment-16605492 ] Sunil Govindan commented on YARN-8258: -- If there are no objections, i ll get this patch for 3.2 release. > YARN webappcontext for UI2 should inherit all filters from default context > -- > > Key: YARN-8258 > URL: https://issues.apache.org/jira/browse/YARN-8258 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Sumana Sathish >Assignee: Sunil Govindan >Priority: Major > Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, > YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, > YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, > YARN-8258.007.patch, YARN-8258.008.patch > > > Thanks [~ssath...@hortonworks.com] for finding this. > Ideally all filters from default context has to be inherited to UI2 context > as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5592) Add support for dynamic resource updates with multiple resource types
[ https://issues.apache.org/jira/browse/YARN-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605491#comment-16605491 ] Sunil Govindan commented on YARN-5592: -- Few comments on the design doc attached, 1. {{We can introduce a file named dymanic-resource-types.xml}} . This doesn't looks like a cleaner approach to me. We should use the existing resource-types.xml and add new types as desired. Once its loaded, yarn should auto detect whats new added and update internally the same. 2. {{We can introduce an option something like “-refreshResourceTypes”}} I am fine with such a CLI option to force YARN to fetch updated resource types. Naming seems a bit confusing which can be improved. 3. *Approach A:* is not good as its kills containers. As per me *Update existing resource types in RM (resource1)*, this is not an immediate use case, hence in first round lets skip this. 4. My 2 cents on removal of resource types. This is one of the complex operations. Some node may have this resource type, and some container may be running with this and some are waiting etc. Hence a removal of resource type cannot be done seemless. I think we should restart YARN ideally in such cases. [~leftnoteasy] [~cheersyang], what's ur thoughts on support removal of a resource types at run time from YARN? > Add support for dynamic resource updates with multiple resource types > - > > Key: YARN-5592 > URL: https://issues.apache.org/jira/browse/YARN-5592 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Manikandan R >Priority: Major > Attachments: YARN-5592-design-2.docx > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8749) Restrict job submission to queue based on apptype
[ https://issues.apache.org/jira/browse/YARN-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605489#comment-16605489 ] Oleksandr Shevchenko edited comment on YARN-8749 at 9/6/18 8:59 AM: YARN has the information about a type of each application in ApplicationSubmissionContext. We can implement this in RMAppManager#createAndPopulateNewRMApp() the same as ACL check for a queue. And add some additional property for Fair and Capacity schedulers. For example: {code} SPARK {code} In this case "q1" will have list of accessible application types wich contains only "SPARK" type. As the result we can submit only Spark application. Applications with some other types (YARN, TEZ etc) will be rejected. Could someone evaluate this feature and approach? And if no one objects I would like to start working on this. Thanks a lot for any comments. was (Author: oshevchenko): YARN has the information about a type of each application in ApplicationSubmissionContext. We can implement this in RMAppManager#createAndPopulateNewRMApp() the same as ACL check for a queue. And add some additional property for Fair and Capacity schedulers. For example: SPARK In this case "q1" will have list of accessible application types wich contains only "SPARK" type. As the result we can submit only Spark application. Applications with some other types (YARN, TEZ etc) will be rejected. Could someone evaluate this feature and approach? And if no one objects I would like to start working on this. Thanks a lot for any comments. > Restrict job submission to queue based on apptype > - > > Key: YARN-8749 > URL: https://issues.apache.org/jira/browse/YARN-8749 > Project: Hadoop YARN > Issue Type: New Feature > Components: RM, scheduler >Reporter: Oleksandr Shevchenko >Assignee: Oleksandr Shevchenko >Priority: Minor > > The proposed possibility here is adding a new property for queue tuning to > allow submit an application to queue only with the allowed types. If an > application has a different type from queue allowed types, the application > should be rejected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8749) Restrict job submission to queue based on apptype
[ https://issues.apache.org/jira/browse/YARN-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605489#comment-16605489 ] Oleksandr Shevchenko commented on YARN-8749: YARN has the information about a type of each application in ApplicationSubmissionContext. We can implement this in RMAppManager#createAndPopulateNewRMApp() the same as ACL check for a queue. And add some additional property for Fair and Capacity schedulers. For example: SPARK In this case "q1" will have list of accessible application types wich contains only "SPARK" type. As the result we can submit only Spark application. Applications with some other types (YARN, TEZ etc) will be rejected. Could someone evaluate this feature and approach? And if no one objects I would like to start working on this. Thanks a lot for any comments. > Restrict job submission to queue based on apptype > - > > Key: YARN-8749 > URL: https://issues.apache.org/jira/browse/YARN-8749 > Project: Hadoop YARN > Issue Type: New Feature > Components: RM, scheduler >Reporter: Oleksandr Shevchenko >Assignee: Oleksandr Shevchenko >Priority: Minor > > The proposed possibility here is adding a new property for queue tuning to > allow submit an application to queue only with the allowed types. If an > application has a different type from queue allowed types, the application > should be rejected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8749) Restrict job submission to queue based on apptype
Oleksandr Shevchenko created YARN-8749: -- Summary: Restrict job submission to queue based on apptype Key: YARN-8749 URL: https://issues.apache.org/jira/browse/YARN-8749 Project: Hadoop YARN Issue Type: New Feature Components: RM, scheduler Reporter: Oleksandr Shevchenko Assignee: Oleksandr Shevchenko The proposed possibility here is adding a new property for queue tuning to allow submit an application to queue only with the allowed types. If an application has a different type from queue allowed types, the application should be rejected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org