[jira] [Commented] (YARN-8692) Support node utilization metrics for SLS
[ https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589730#comment-16589730 ] Tao Yang commented on YARN-8692: Thanks [~cheersyang] for the feedback. Attached v1 patch, please help to review in your free time. > Support node utilization metrics for SLS > > > Key: YARN-8692 > URL: https://issues.apache.org/jira/browse/YARN-8692 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8692.001.patch, image-2018-08-21-18-04-22-749.png > > > The distribution of node utilization is an important healthy factor for the > YARN cluster, related metrics in SLS can be used to evaluate the scheduling > effects and optimize related configurations. > To implement this improvement, we need to do things as below: > (1) Add input configurations (contain avg and stddev for cpu/memory > utilization ratio) and generate utilization samples for tasks, not include AM > container cause I think it's negligible. > (2) Simulate containers and node utilization within node status. > (3) calculate and generate the distribution metrics and use standard > deviation metric (stddev for short) to evaluate the effects(smaller is > better). > (4) show these metrics on SLS simulator page like this: > !image-2018-08-21-18-04-22-749.png! > For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, > and P0 represents 0%~9% utilization ratio(containers-utilization / > node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents > 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization > ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8692) Support node utilization metrics for SLS
[ https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8692: --- Attachment: YARN-8692.001.patch > Support node utilization metrics for SLS > > > Key: YARN-8692 > URL: https://issues.apache.org/jira/browse/YARN-8692 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8692.001.patch, image-2018-08-21-18-04-22-749.png > > > The distribution of node utilization is an important healthy factor for the > YARN cluster, related metrics in SLS can be used to evaluate the scheduling > effects and optimize related configurations. > To implement this improvement, we need to do things as below: > (1) Add input configurations (contain avg and stddev for cpu/memory > utilization ratio) and generate utilization samples for tasks, not include AM > container cause I think it's negligible. > (2) Simulate containers and node utilization within node status. > (3) calculate and generate the distribution metrics and use standard > deviation metric (stddev for short) to evaluate the effects(smaller is > better). > (4) show these metrics on SLS simulator page like this: > !image-2018-08-21-18-04-22-749.png! > For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, > and P0 represents 0%~9% utilization ratio(containers-utilization / > node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents > 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization > ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8692) Support node utilization metrics for SLS
[ https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589719#comment-16589719 ] Tao Yang edited comment on YARN-8692 at 8/23/18 5:48 AM: - {quote} I am curious how node memory/cpu is calculated here? Is it based on the allocated memory/cpu? {quote} Yes, it's based on allocated memory/cpu. Detailed calculation as follow: {noformat} container-utilization = $container-allocated-resource * $task-utilization-ratio node-utilization = sum($container-utilization) {noformat} {{$task-utilization-ratio}} can be configured with average and standard deviation, so that we can generate different task-utilization-ratio samples as we wanted for containers. For example, we can configured "memory_utilization_ratio":{ "val": 0.5, "std": 0.01} for map tasks so that the memory utilization for map containers will be calculated as below: {noformat} allocated-memory = 1000 memory-utilization-ratio-sample is a random double value from 0.49 to 0.51 memory-utilization-of-map-container = $allocated-memory * $memory-utilization-ratio-sample {noformat} As a result, utilization of map container can be 490, 491, 492, ..., 508, 509 or 510 was (Author: tao yang): {quote} I am curious how node memory/cpu is calculated here? Is it based on the allocated memory/cpu? {quote} Yes, it's based on allocated memory/cpu. Detailed calculation as follow: {noformat} node-utilization = sum(container-utilization) container-utilization = container-allocated-resource * task-utilization-ratio {noformat} {{task-utilization-ratio}} can be configured with average and standard deviation, so that we can generate different task-utilization-ratio samples as we wanted for containers. For example, we can configured {{"memory_utilization_ratio":{ "val": 0.5, "std": 0.01}}} for map tasks so that we can calculate the memory utilization for map containers as below: {noformat} allocated-memory = 1000 memory-utilization-ratio-sample is a random double value from 0.49 to 0.51 memory-utilization-of-map-container = $allocated-memory * $memory-utilization-ratio-sample {noformat} so that utilization of map container can be 490, 491, 492, ..., 508, 509 or 510 > Support node utilization metrics for SLS > > > Key: YARN-8692 > URL: https://issues.apache.org/jira/browse/YARN-8692 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: image-2018-08-21-18-04-22-749.png > > > The distribution of node utilization is an important healthy factor for the > YARN cluster, related metrics in SLS can be used to evaluate the scheduling > effects and optimize related configurations. > To implement this improvement, we need to do things as below: > (1) Add input configurations (contain avg and stddev for cpu/memory > utilization ratio) and generate utilization samples for tasks, not include AM > container cause I think it's negligible. > (2) Simulate containers and node utilization within node status. > (3) calculate and generate the distribution metrics and use standard > deviation metric (stddev for short) to evaluate the effects(smaller is > better). > (4) show these metrics on SLS simulator page like this: > !image-2018-08-21-18-04-22-749.png! > For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, > and P0 represents 0%~9% utilization ratio(containers-utilization / > node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents > 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization > ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8692) Support node utilization metrics for SLS
[ https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589719#comment-16589719 ] Tao Yang commented on YARN-8692: {quote} I am curious how node memory/cpu is calculated here? Is it based on the allocated memory/cpu? {quote} Yes, it's based on allocated memory/cpu. Detailed calculation as follow: {noformat} node-utilization = sum(container-utilization) container-utilization = container-allocated-resource * task-utilization-ratio {noformat} {{task-utilization-ratio}} can be configured with average and standard deviation, so that we can generate different task-utilization-ratio samples as we wanted for containers. For example, we can configured {{"memory_utilization_ratio":{ "val": 0.5, "std": 0.01}}} for map tasks so that we can calculate the memory utilization for map containers as below: {noformat} allocated-memory = 1000 memory-utilization-ratio-sample is a random double value from 0.49 to 0.51 memory-utilization-of-map-container = $allocated-memory * $memory-utilization-ratio-sample {noformat} so that utilization of map container can be 490, 491, 492, ..., 508, 509 or 510 > Support node utilization metrics for SLS > > > Key: YARN-8692 > URL: https://issues.apache.org/jira/browse/YARN-8692 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: image-2018-08-21-18-04-22-749.png > > > The distribution of node utilization is an important healthy factor for the > YARN cluster, related metrics in SLS can be used to evaluate the scheduling > effects and optimize related configurations. > To implement this improvement, we need to do things as below: > (1) Add input configurations (contain avg and stddev for cpu/memory > utilization ratio) and generate utilization samples for tasks, not include AM > container cause I think it's negligible. > (2) Simulate containers and node utilization within node status. > (3) calculate and generate the distribution metrics and use standard > deviation metric (stddev for short) to evaluate the effects(smaller is > better). > (4) show these metrics on SLS simulator page like this: > !image-2018-08-21-18-04-22-749.png! > For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, > and P0 represents 0%~9% utilization ratio(containers-utilization / > node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents > 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization > ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8015) Support all types of placement constraint support for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589704#comment-16589704 ] Weiwei Yang commented on YARN-8015: --- Thanks [~sunilg]! > Support all types of placement constraint support for Capacity Scheduler > > > Key: YARN-8015 > URL: https://issues.apache.org/jira/browse/YARN-8015 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Fix For: 3.2.0 > > Attachments: YARN-8015.001.patch, YARN-8015.002.patch, > YARN-8015.003.patch, YARN-8015.004.patch > > > AppPlacementAllocator currently only supports intra-app anti-affinity > placement constraints, once YARN-8002 and YARN-8013 are resolved, it needs to > support inter-app constraints too. Also, this may require some refactoring on > the existing code logic. Use this JIRA to track. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8015) Support all types of placement constraint support for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589692#comment-16589692 ] Hudson commented on YARN-8015: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14818 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14818/]) YARN-8015. Support all types of placement constraint support for (sunilg: rev 1ac01444a24faee6f74f2e83d9521eb4e0be651b) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/SingleConstraintAppPlacementAllocator.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestSchedulingRequestContainerAllocation.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/TestSingleConstraintAppPlacementAllocator.java > Support all types of placement constraint support for Capacity Scheduler > > > Key: YARN-8015 > URL: https://issues.apache.org/jira/browse/YARN-8015 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Fix For: 3.2.0 > > Attachments: YARN-8015.001.patch, YARN-8015.002.patch, > YARN-8015.003.patch, YARN-8015.004.patch > > > AppPlacementAllocator currently only supports intra-app anti-affinity > placement constraints, once YARN-8002 and YARN-8013 are resolved, it needs to > support inter-app constraints too. Also, this may require some refactoring on > the existing code logic. Use this JIRA to track. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
[ https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sen Zhao updated YARN-8701: --- Description: If I configure *MaxResources* in fair-scheduler.xml, like this: {code}resource1=50{code} In the queue, the *MaxResources* value will change to {code}Max Resources: {code} I think the value of VCores should be *CLUSTER_VCORES*. was: If I configure *MaxResources* in fair-scheduler.xml, like this: {code}resource1=50{code} In the queue, the *MaxResources* value will change to {code}memory:CLUSTER_MEMORY, VCores:0, resource1:50{code} I think the value of VCores should be *CLUSTER_VCORES*. > If the single parameter in Resources#createResourceWithSameValue is greater > than Integer.MAX_VALUE, then the value of vcores will be -1 > --- > > Key: YARN-8701 > URL: https://issues.apache.org/jira/browse/YARN-8701 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Reporter: Sen Zhao >Assignee: Sen Zhao >Priority: Major > Attachments: YARN-8701.001.patch > > > If I configure *MaxResources* in fair-scheduler.xml, like this: > {code}resource1=50{code} > In the queue, the *MaxResources* value will change to > {code}Max Resources: {code} > I think the value of VCores should be *CLUSTER_VCORES*. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
[ https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589689#comment-16589689 ] genericqa commented on YARN-8701: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 5s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8701 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936755/YARN-8701.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 42f5e9a8913d 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b021249 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21664/testReport/ | | Max. process+thread count | 329 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21664/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > If the single parameter in Resources#createResourceWithSameValue is greater > than Integer.MAX_VALUE, then the v
[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589688#comment-16589688 ] genericqa commented on YARN-8649: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 7s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8649 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936754/YARN-8649_5.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d74f059f170e 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b021249 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21663/testReport/ | | Max. process+thread count | 336 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21663/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Similar as YARN-4355:NPE while process
[jira] [Updated] (YARN-8015) Support affinity placement constraint support for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8015: - Summary: Support affinity placement constraint support for Capacity Scheduler (was: Complete placement constraint support for Capacity Scheduler) > Support affinity placement constraint support for Capacity Scheduler > > > Key: YARN-8015 > URL: https://issues.apache.org/jira/browse/YARN-8015 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: YARN-8015.001.patch, YARN-8015.002.patch, > YARN-8015.003.patch, YARN-8015.004.patch > > > AppPlacementAllocator currently only supports intra-app anti-affinity > placement constraints, once YARN-8002 and YARN-8013 are resolved, it needs to > support inter-app constraints too. Also, this may require some refactoring on > the existing code logic. Use this JIRA to track. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8015) Support all types of placement constraint support for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8015: - Summary: Support all types of placement constraint support for Capacity Scheduler (was: Support affinity placement constraint support for Capacity Scheduler) > Support all types of placement constraint support for Capacity Scheduler > > > Key: YARN-8015 > URL: https://issues.apache.org/jira/browse/YARN-8015 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: YARN-8015.001.patch, YARN-8015.002.patch, > YARN-8015.003.patch, YARN-8015.004.patch > > > AppPlacementAllocator currently only supports intra-app anti-affinity > placement constraints, once YARN-8002 and YARN-8013 are resolved, it needs to > support inter-app constraints too. Also, this may require some refactoring on > the existing code logic. Use this JIRA to track. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589682#comment-16589682 ] Zhankun Tang commented on YARN-8698: [~yuan_zac] Yeah. Wrong HADOOP_COMMON_HOME env will cause "hadoop classpath" failed. But did you specify the "DOCKER_HADOOP_HDFS_HOME" to the hadoop home directory in your Docker image? I guess if this is specified, at least the run-PRIMARY_WORKER.sh won't fail? {code:java} yarn jar path-to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar job run \ --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/ \ --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 ... {code} > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8692) Support node utilization metrics for SLS
[ https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589680#comment-16589680 ] Weiwei Yang commented on YARN-8692: --- +1 for the idea, it will be very helpful for testing load distribution. I am curious how node memory/cpu is calculated here? Is it based on the allocated memory/cpu? > Support node utilization metrics for SLS > > > Key: YARN-8692 > URL: https://issues.apache.org/jira/browse/YARN-8692 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: image-2018-08-21-18-04-22-749.png > > > The distribution of node utilization is an important healthy factor for the > YARN cluster, related metrics in SLS can be used to evaluate the scheduling > effects and optimize related configurations. > To implement this improvement, we need to do things as below: > (1) Add input configurations (contain avg and stddev for cpu/memory > utilization ratio) and generate utilization samples for tasks, not include AM > container cause I think it's negligible. > (2) Simulate containers and node utilization within node status. > (3) calculate and generate the distribution metrics and use standard > deviation metric (stddev for short) to evaluate the effects(smaller is > better). > (4) show these metrics on SLS simulator page like this: > !image-2018-08-21-18-04-22-749.png! > For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, > and P0 represents 0%~9% utilization ratio(containers-utilization / > node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents > 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization > ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
[ https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sen Zhao reassigned YARN-8701: -- Assignee: Sen Zhao Attachment: YARN-8701.001.patch > If the single parameter in Resources#createResourceWithSameValue is greater > than Integer.MAX_VALUE, then the value of vcores will be -1 > --- > > Key: YARN-8701 > URL: https://issues.apache.org/jira/browse/YARN-8701 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Reporter: Sen Zhao >Assignee: Sen Zhao >Priority: Major > Attachments: YARN-8701.001.patch > > > If I configure *MaxResources* in fair-scheduler.xml, like this: > {code}resource1=50{code} > In the queue, the *MaxResources* value will change to > {code}memory:CLUSTER_MEMORY, VCores:0, > resource1:50{code} > I think the value of VCores should be *CLUSTER_VCORES*. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
[ https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sen Zhao updated YARN-8701: --- Description: If I configure *MaxResources* in fair-scheduler.xml, like this: {code}resource1=50{code} In the queue, the *MaxResources* value will change to {code}memory:CLUSTER_MEMORY, VCores:0, resource1:50{code} I think the value of VCores should be *CLUSTER_VCORES*. > If the single parameter in Resources#createResourceWithSameValue is greater > than Integer.MAX_VALUE, then the value of vcores will be -1 > --- > > Key: YARN-8701 > URL: https://issues.apache.org/jira/browse/YARN-8701 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Reporter: Sen Zhao >Priority: Major > > If I configure *MaxResources* in fair-scheduler.xml, like this: > {code}resource1=50{code} > In the queue, the *MaxResources* value will change to > {code}memory:CLUSTER_MEMORY, VCores:0, > resource1:50{code} > I think the value of VCores should be *CLUSTER_VCORES*. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589652#comment-16589652 ] lujie commented on YARN-8649: - @ [~jlowe] I have improved the log as your suggestion. thanks for your cice review > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, > YARN-8649_4.patch, YARN-8649_5.patch, hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589652#comment-16589652 ] lujie edited comment on YARN-8649 at 8/23/18 3:32 AM: -- @ [~jlowe] I have improved the log as your suggestion. thanks for your nice review was (Author: xiaoheipangzi): @ [~jlowe] I have improved the log as your suggestion. thanks for your cice review > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, > YARN-8649_4.patch, YARN-8649_5.patch, hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8649: Attachment: YARN-8649_5.patch > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, > YARN-8649_4.patch, YARN-8649_5.patch, hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8691) AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum size
[ https://issues.apache.org/jira/browse/YARN-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan reassigned YARN-8691: Assignee: Yicong Cai > AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum > size > -- > > Key: YARN-8691 > URL: https://issues.apache.org/jira/browse/YARN-8691 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Critical > Fix For: 2.7.7 > > > SparkSQL AM Codegen ERROR,then call unregister AM API and send the error > message to RM, RM receive the AM state and update to RMStateStore. The > Codegen error message maybe is huge, (Our case is about 200MB). If the > RMStateStore is ZKRMStateStore, it causes the same exception as YARN-6125, > but YARN-6125 doesn't cover the unregisterApplicationMaster's message cut. > > SparkSQL Codegen error message show below: > 18/08/18 08:34:54 ERROR codegen.CodeGenerator: failed to compile: > org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM > limit of 0x > /* 001 */ public java.lang.Object generate(Object[] references) > { /* 002 */ return new SpecificSafeProjection(references); /* 003 */ } > /* 004 */ > /* 005 */ class SpecificSafeProjection extends > org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection { > .. > about 2 million lines. > .. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8691) AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum size
[ https://issues.apache.org/jira/browse/YARN-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589647#comment-16589647 ] Sunil Govindan commented on YARN-8691: -- Thank [~caiyicong], assigned to u. > AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum > size > -- > > Key: YARN-8691 > URL: https://issues.apache.org/jira/browse/YARN-8691 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Critical > Fix For: 2.7.7 > > > SparkSQL AM Codegen ERROR,then call unregister AM API and send the error > message to RM, RM receive the AM state and update to RMStateStore. The > Codegen error message maybe is huge, (Our case is about 200MB). If the > RMStateStore is ZKRMStateStore, it causes the same exception as YARN-6125, > but YARN-6125 doesn't cover the unregisterApplicationMaster's message cut. > > SparkSQL Codegen error message show below: > 18/08/18 08:34:54 ERROR codegen.CodeGenerator: failed to compile: > org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM > limit of 0x > /* 001 */ public java.lang.Object generate(Object[] references) > { /* 002 */ return new SpecificSafeProjection(references); /* 003 */ } > /* 004 */ > /* 005 */ class SpecificSafeProjection extends > org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection { > .. > about 2 million lines. > .. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
Sen Zhao created YARN-8701: -- Summary: If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1 Key: YARN-8701 URL: https://issues.apache.org/jira/browse/YARN-8701 Project: Hadoop YARN Issue Type: Bug Components: api Reporter: Sen Zhao -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8700) Application cannot un-registered
fox created YARN-8700: - Summary: Application cannot un-registered Key: YARN-8700 URL: https://issues.apache.org/jira/browse/YARN-8700 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.3 Reporter: fox Dear all, I found a problem with application unregistration in AWS EMR environment (emr-5.8.0, hadoop 2.7.3, spark 2.2.0). Application Type: Both Yarn and Spark State: RUNNING Inside the job logs, I got 07:00:07.190 [main] INFO c.w.c.e.a.n.b.AbstractNormalBatchMain - [EDP2] Ready to run Tear Down 07:00:07.192 [main] INFO c.w.c.e.a.n.b.AbstractNormalBatchMain - [EDP2] Ready to run Tear Down 07:00:07.192 [main] INFO c.w.c.e.a.n.b.AbstractNormalBatchMain - [EDP2] Job Finish 07:00:07.195 [main] INFO o.s.c.a.AnnotationConfigApplicationContext - Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@144ab54: startup date [Tue Aug 21 06:59:23 UTC 2018]; root of context hierarchy 07:00:07.306 [main] INFO o.s.s.c.ThreadPoolTaskExecutor - Shutting down ExecutorService 'redisClusterExecutor' 07:00:07.551 [main] INFO o.a.k.clients.producer.KafkaProducer - Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. 07:00:07.565 [main] INFO c.w.c.f.m.MessageQueueKafkaProducerImpl - Closed all the producer's connections for tenant: 7fd0356c-1258-11e8-abfd-0242ac110002. 07:00:09.869 [main] INFO c.w.c.edp2.normal.batch.AppMaster - finish run main method 07:00:09.870 [main] INFO c.w.c.edp2.normal.batch.AppMaster - delete temp file /tmp/aa33f388-f591-44a8-9aa3-13e2f8427c5d2802069659156113885.jar 07:00:10.112 [main] INFO o.a.h.y.c.api.impl.AMRMClientImpl - Waiting for application to be successfully unregistered. 07:00:10.215 [main] INFO o.a.h.y.c.api.impl.AMRMClientImpl - Waiting for application to be successfully unregistered. 07:00:10.319 [main] INFO o.a.h.y.c.api.impl.AMRMClientImpl - Waiting for application to be successfully unregistered. 07:00:10.422 [main] INFO o.a.h.y.c.api.impl.AMRMClientImpl - Waiting for application to be successfully unregistered. 07:00:10.528 [main] INFO o.a.h.y.c.api.impl.AMRMClientImpl - Waiting for application to be successfully unregistered. and it keeps more than one day until I stopped the whole cluster. I also try to kill the application by yarn command, which also keeps forever waiting for application to be killed. hadoop@ip-10-100-2-124 ~]$ yarn application -kill application_1534810852740_0721 18/08/22 12:24:28 INFO impl.TimelineClientImpl: Timeline service address: http://ip-10-100-2-124.ap-northeast-1.compute.internal:8188/ws/v1/timeline/ 18/08/22 12:24:29 INFO client.RMProxy: Connecting to ResourceManager at ip-10-100-2-124.ap-northeast-1.compute.internal/10.100.2.124:8032 Killing application application_1534810852740_0721 18/08/22 12:24:32 INFO impl.YarnClientImpl: Waiting for application application_1534810852740_0721 to be killed. 18/08/22 12:24:34 INFO impl.YarnClientImpl: Waiting for application application_1534810852740_0721 to be killed. 18/08/22 12:24:36 INFO impl.YarnClientImpl: Waiting for application application_1534810852740_0721 to be killed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589605#comment-16589605 ] Weiwei Yang commented on YARN-7863: --- Hi [~Naganarasimha] Unlike partitions, PC is not associated with resource, {quote}there is no way to find out for a given queue how much pending resources are there in each partition it can access. {quote} PCs are not associated with resource, so it's like an extra check after all other checks are done. Scheduler still calculates how much resource available in a partition for a given queue, assign resource from a node in this partition to a request, but if PC is not satisfied then the allocation proposal will be rejected. Partition in PC is not ready, to be honest, I am not sure if everything is align with existing label-based scheduling. I suggested in YARN-8015 to open a separate task for further enhance that. {quote}And also i am not able to envisage the scenario where in partition needs to be OR'd with Allocation tags or Attributes. {quote} Agree, it won't make sense to put a OR between a partition constraint and a allocation-tag/attribute constraint. But other combinations are useful. We support this, however if a PC is really meaningful that is up to the user. > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, > YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589597#comment-16589597 ] Zac Zhou commented on YARN-8698: Thanks a lot, [~leftnoteasy] :) Hi [~tangzhankun], I think this issue is related to hadoop classpath. The hadoop path of nodemanager is different from the one of docker. launch_container.sh specifies HADOOP_COMMON_HOME to the path which doesn't exists in the docker container. run-PRIMARY_WORKER.sh failed to execute the command: export CLASSPATH=`$HADOOP_HDFS_HOME/bin/hadoop classpath --glob` so classpath can't generated correctly. I validate this issue with the following step: # move hadoop package to some path, like A. # specify HADOOP_COMMON_HOME to some other path, like B, which is not hadoop package location: export HADOOP_COMMON_HOME=B # execute the command: ${A}/bin/hadoop classpath --glob We will get the following error: Error: Could not find or load main class org.apache.hadoop.util.Classpath If any more info is needed, feel free to let me know~ Thanks > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8561) [Submarine] Initial implementation: Training job submission and job history retrieval
[ https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589561#comment-16589561 ] Zhankun Tang commented on YARN-8561: [~leftnoteasy] I'm going through the code. And a minor problem about below code: {code:java} boolean lackingEnvs = false; ...// set lackingEnvs to true based on some conditions if (lackingEnvs) { LOG.error("When hdfs is being used to read/write models/data. Following" + "envs are required: 1) DOCKER_HADOOP_HDFS_HOME= 2) DOCKER_JAVA_HOME=. You can use --env to pass these envars."); throw new IOException("Failed to detect HDFS-related environments."); } {code} It seems that if users don't specify these two required environment variables, the error message won't be thrown. Is it expected? > [Submarine] Initial implementation: Training job submission and job history > retrieval > - > > Key: YARN-8561 > URL: https://issues.apache.org/jira/browse/YARN-8561 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8561.001.patch, YARN-8561.002.patch, > YARN-8561.003.patch, YARN-8561.004.patch, YARN-8561.005.patch > > > Added following parts: > 1) New subcomponent of YARN, under applications/ project. > 2) Tensorflow training job submission, including training (single node and > distributed). > - Supported Docker container. > - Support GPU isolation. > - Support YARN registry DNS. > 3) Retrieve job history. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589559#comment-16589559 ] Naganarasimha G R commented on YARN-7863: - Hi [~sunilg], few other nits in the patch : * NodeAttributesManagerImpl ln no 205: as i was mentioning there is both debug and info log here ,i think we can remove the debug log. * NodeAttributesManagerImpl ln no 212-226 : Here you are sending out complete update of the node collections, and not just the modified NM's only. There are multiple impacts due to this, in large cluster we are unnecessarily sending lot of updates to scheduler secondly removed attributes will not be captured in this way. Where in later is more important * NodeAttributesManagerImpl ln no 212-226 : In general idea earlier was to make use of the AttributeValue by the scheduler so that the converted value is stored and used for comparison . But if we have the flexibility later on to change the scheduler event which is being pushed from NAM to Schedulers then i am fine with the Event being sent out, else i would suggest to send the AttributeValue itself * Test cases for AND and OR are covered ? Though i could see AND not declaratively covered in TestPlacementConstraintParser but better to cover with AND and OR explicitly * PlacementSpec ln no 51: typo "teh" > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, > YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8685) Add containers query support for nodes/node REST API in RMWebServices
[ https://issues.apache.org/jira/browse/YARN-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589551#comment-16589551 ] Tao Yang commented on YARN-8685: {quote} How about my suggestion about adding a new endpoint in RM? Does that make sense to you? {quote} Yes, it makes sense to me. : ) > Add containers query support for nodes/node REST API in RMWebServices > - > > Key: YARN-8685 > URL: https://issues.apache.org/jira/browse/YARN-8685 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8685.001.patch > > > Currently we can only query running containers from NM containers REST API, > but can't get the valid containers which are in ALLOCATED/ACQUIRED state. We > have the requirements to get all containers allocated on specified nodes for > debugging. I want to add a "includeContainers" query param (default false) > for nodes/node REST API in RMWebServices, so that we can get valid containers > on nodes if "includeContainers=true" specified. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8697) LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when cannot resolve resource
[ https://issues.apache.org/jira/browse/YARN-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589541#comment-16589541 ] genericqa commented on YARN-8697: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 52s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 69m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8697 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936716/YARN-8697.v1.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4eba3d4b80df 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / af4b705 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21662/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21662/testReport/ | | Max. process+thread count | 440 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: hadoop-yarn-project
[jira] [Commented] (YARN-8696) FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589528#comment-16589528 ] genericqa commented on YARN-8696: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 3s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 14s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 23s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 35s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 29s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}192m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer | | | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8696 | | JIRA Patch URL | https://issue
[jira] [Commented] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589498#comment-16589498 ] Zhankun Tang commented on YARN-8698: [~yuan_zac] Thanks for the path! And I have a question that does the patch solve the issue you met? And if possible, can you post more information on your Tensorflow environment and job script so that I can help reproduce your issue and double-confirm? Thanks. > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8696) FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589484#comment-16589484 ] genericqa commented on YARN-8696: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 24m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 40s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 10s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 26s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 34s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 35s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}220m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8696 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936691/YARN-8696.v2.patch | | Optional Tests
[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable
[ https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589478#comment-16589478 ] Wangda Tan commented on YARN-8638: -- [~ccondit-target], Thanks for working on this ticket. It will be very clean if we can make runc/containerd to be a separate ContainerRuntime implementation. But not sure that if all the common logics like ContainerLaunch/LinuxContainerExecutor works fine for containerd/runc. If involved changes required, we may have to consider to move the abstraction to ContainerExecutor level, etc. > Allow linux container runtimes to be pluggable > -- > > Key: YARN-8638 > URL: https://issues.apache.org/jira/browse/YARN-8638 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Minor > Attachments: YARN-8638.001.patch, YARN-8638.002.patch > > > YARN currently supports three different Linux container runtimes (default, > docker, and javasandbox). However, it would be relatively straightforward to > support arbitrary runtime implementations. This would enable easier > experimentation with new and emerging runtime technologies (runc, containerd, > etc.) without requiring a rebuild and redeployment of Hadoop. > This could be accomplished via a simple configuration change: > {code:xml} > > yarn.nodemanager.runtime.linux.allowed-runtimes > default,docker,experimental > > > > yarn.nodemanager.runtime.linux.experimental.class > com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime > {code} > > In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would > now allow arbitrary values. Additionally, > {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the > {{LinuxContainerRuntime}} implementation to instantiate. A no-argument > constructor should be sufficient, as {{LinuxContainerRuntime}} already > provides an {{initialize()}} method. > {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map > env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} > could be generalized to {{isRuntimeRequested(Map env)}} and > added to the {{LinuxContainerRuntime}} interface. This would allow > {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on > whether that runtime claimed ownership of the current container execution. > For backwards compatibility, the existing values (default,docker,javasandbox) > would continue to be supported as-is. Under the current logic, the evaluation > order is javasandbox, docker, default (with default being chosen if no other > candidates are available). Under the new evaluation logic, pluggable runtimes > would be evaluated after docker and before default, in the order in which > they are defined in the allowed-runtimes list. This will change no behavior > on current clusters (as there would be no pluggable runtimes defined), and > preserves behavior with respect to ordering of existing runtimes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8697) LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when cannot resolve resource
[ https://issues.apache.org/jira/browse/YARN-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8697: --- Attachment: YARN-8697.v1.patch > LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when > cannot resolve resource > --- > > Key: YARN-8697 > URL: https://issues.apache.org/jira/browse/YARN-8697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8697.v1.patch > > > Right now in LocalityMulticastAMRMProxyPolicy, whenever we cannot resolve the > resource name (node or rack), we always route the request to home > sub-cluster. However, home sub-cluster might not be always be ready to use > (timed out YARN-8581) or enabled (by AMRMProxyPolicy weights). It might also > be overwhelmed by the requests if sub-cluster resolver has some issue. In > this Jira, we are changing it to pick a random active and enabled sub-cluster > for resource request we cannot resolve. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589432#comment-16589432 ] Jason Lowe commented on YARN-8649: -- Thanks for updating the patch! Logic looks good overall, but I have some concerns on the logging that was added. I think it's misleading to assume the NM is shutting down when this situation occurs. As I understand it, the main trigger for this scenario is a container getting killed while it is still localizing. That can happen when the NM shuts down, but it can also happen without the NM shutting down. Therefore it seems inappropriate to assume this scenario means the NM is shutting down. There are already separate logs when the NM decides to shut down so probably best to keep this logging to just the fact that the resource was removed before we got around to localizing it and therefore will no longer be localized. The warning log should show the source resource, similar to what is done in the public localization debug code that was added, rather than the local path. The local path won't mean as much as the resource that was requested, as that source resource path was logged when it was initially requested by the container. There is debug logging in the public localizer case but not the private case which is inconsistent. Arguably if it's useful for the public case it would be useful for the private case. Given there's a loud warning log already in the common getPathForLocalization code, I'm not sure the debug log in the public path adds any value, especially if we change the loud warning log to show the source path. > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, > YARN-8649_4.patch, hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589430#comment-16589430 ] Eric Yang commented on YARN-8569: - >From today's docker meetup discussion, distributed cache is not ideal >interface for replicate frequently changing cluster information. If the file >checksum changes due to cluster information update, the file may not get >replicated to distributed cache. This information is more similar to token >generation and population instead of jar file distribution. I will change the >population mechanism to align with token population instead of distributed >cache. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable
[ https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589403#comment-16589403 ] Chandni Singh commented on YARN-8638: - [~ccondit-target] The change looks good to me. I have a question about the pluggable class. Will there by any plugin discovery mechanism? or the plugin class should be in NM's classpath? > Allow linux container runtimes to be pluggable > -- > > Key: YARN-8638 > URL: https://issues.apache.org/jira/browse/YARN-8638 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Minor > Attachments: YARN-8638.001.patch, YARN-8638.002.patch > > > YARN currently supports three different Linux container runtimes (default, > docker, and javasandbox). However, it would be relatively straightforward to > support arbitrary runtime implementations. This would enable easier > experimentation with new and emerging runtime technologies (runc, containerd, > etc.) without requiring a rebuild and redeployment of Hadoop. > This could be accomplished via a simple configuration change: > {code:xml} > > yarn.nodemanager.runtime.linux.allowed-runtimes > default,docker,experimental > > > > yarn.nodemanager.runtime.linux.experimental.class > com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime > {code} > > In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would > now allow arbitrary values. Additionally, > {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the > {{LinuxContainerRuntime}} implementation to instantiate. A no-argument > constructor should be sufficient, as {{LinuxContainerRuntime}} already > provides an {{initialize()}} method. > {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map > env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} > could be generalized to {{isRuntimeRequested(Map env)}} and > added to the {{LinuxContainerRuntime}} interface. This would allow > {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on > whether that runtime claimed ownership of the current container execution. > For backwards compatibility, the existing values (default,docker,javasandbox) > would continue to be supported as-is. Under the current logic, the evaluation > order is javasandbox, docker, default (with default being chosen if no other > candidates are available). Under the new evaluation logic, pluggable runtimes > would be evaluated after docker and before default, in the order in which > they are defined in the allowed-runtimes list. This will change no behavior > on current clusters (as there would be no pluggable runtimes defined), and > preserves behavior with respect to ordering of existing runtimes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service
[ https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang resolved YARN-8675. - Resolution: Not A Problem Reopen by accident during docker meeting. Close again. > Setting hostname of docker container breaks with "host" networking mode for > Apps which do not run as a YARN service > --- > > Key: YARN-8675 > URL: https://issues.apache.org/jira/browse/YARN-8675 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Major > Labels: Docker > > Applications like the Spark AM currently do not run as a YARN service and > setting hostname breaks driver/executor communication if docker version > >=1.13.1 , especially with wire-encryption turned on. > YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could > have a mix of YARN service/native Applications. > The proposal is to not set the hostname when "host" networking mode is > enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service
[ https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reopened YARN-8675: --- > Setting hostname of docker container breaks with "host" networking mode for > Apps which do not run as a YARN service > --- > > Key: YARN-8675 > URL: https://issues.apache.org/jira/browse/YARN-8675 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Major > Labels: Docker > > Applications like the Spark AM currently do not run as a YARN service and > setting hostname breaks driver/executor communication if docker version > >=1.13.1 , especially with wire-encryption turned on. > YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could > have a mix of YARN service/native Applications. > The proposal is to not set the hostname when "host" networking mode is > enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589242#comment-16589242 ] Wangda Tan commented on YARN-8698: -- Thanks [~yuan_zac], added u to the contributor list, you can assign YARN JIRA to yourself now. And please file submarine-related tickets under YARN-8135 in the future. > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-8698: Assignee: Zac Zhou > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8698: - Issue Type: Sub-task (was: Bug) Parent: YARN-8135 > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8698: - Summary: [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job (was: Failed to add hadoop dependencies in docker container when submitting a submarine job) > [Submarine] Failed to add hadoop dependencies in docker container when > submitting a submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zac Zhou >Priority: Major > Attachments: YARN-8698.001.patch > > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589226#comment-16589226 ] Eric Payne commented on YARN-8509: -- {code:title=UsersManager#computeUserLimit} -Resource userLimitResource = Resources.max(resourceCalculator, -partitionResource, -Resources.divideAndCeil(resourceCalculator, resourceUsed, -usersSummedByWeight), -Resources.divideAndCeil(resourceCalculator, -Resources.multiplyAndRoundDown(currentCapacity, getUserLimit()), -100)); +Resource userLimitResource = Resources.multiplyAndRoundDown(queueCapacity, +getUserLimitFactor()); {code} This is a drastic change that affects more than just preemption (the title of this JIRA is "Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent"). Forgive me if I didn't understand that this JIRA is trying to change the way the capacity scheduler calculates user limits. [~leftnoteasy], I thought that the idea goal of the algorithm within {{computeUserLimit}} is to slowly grow each queue once the queue is over it's capacity so that resources can be assigned evenly. Are you okay with this change? > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: capacityscheduler > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589217#comment-16589217 ] Bibin A Chundatt commented on YARN-8699: Spark applcation submission depends in yarnclusterMetrics . {code} logInfo("Requesting a new application from cluster with %d NodeManagers" .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers)) {code} https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala > Add Yarnclient#yarnclusterMetrics API implementation in router > -- > > Key: YARN-8699 > URL: https://issues.apache.org/jira/browse/YARN-8699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Priority: Major > > Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
Bibin A Chundatt created YARN-8699: -- Summary: Add Yarnclient#yarnclusterMetrics API implementation in router Key: YARN-8699 URL: https://issues.apache.org/jira/browse/YARN-8699 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bibin A Chundatt Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6972) Adding RM ClusterId in AppInfo
[ https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589211#comment-16589211 ] Bibin A Chundatt commented on YARN-6972: Thank you [~tanujnay] for patch MInor comments # Changes to TestRMWebServiceAppsNodelabel,TestRMHA,TestRMWebServicesAppsModification,TestRMWebServicesNodeLabels might not be required. With out changes testcases is passing. If rmcluster id is null response will not have rmclusterId field rt ?? # TestRMWebServicesApps is checking field count .Add validation for rmclusterId value too. > Adding RM ClusterId in AppInfo > -- > > Key: YARN-6972 > URL: https://issues.apache.org/jira/browse/YARN-6972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Tanuj Nayak >Priority: Major > Attachments: YARN-6972.001.patch, YARN-6972.002.patch, > YARN-6972.003.patch, YARN-6972.004.patch, YARN-6972.005.patch, > YARN-6972.006.patch, YARN-6972.007.patch, YARN-6972.008.patch, > YARN-6972.009.patch, YARN-6972.010.patch, YARN-6972.011.patch, > YARN-6972.012.patch, YARN-6972.013.patch, YARN-6972.014.patch, > YARN-6972.015.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8696) FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8696: --- Attachment: YARN-8696.v2.patch > FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8696.v1.patch, YARN-8696.v2.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589171#comment-16589171 ] Naganarasimha G R commented on YARN-7863: - Thanks [~cheersyang], for some clarifications but at the same time was discussing with Sunil too. {quote} PC doesn't affect any logic how scheduler selects requests, it is still how it is handled now. A PC is simply checked twice when 1) creating an allocation proposal on a node and 2) at commit phase against a specific node. {quote} I am not specifying that changes in this Jira alone introducing anything implications but my point was, earlier when partitions was introduced it was easy to determine pending resources per partition per queue in the earlier api using resource request. Now with the New api introduced with PC(not just from this Jira alone) there is no way to find out for a given queue how much pending resources are there in each partition it can access. This is because PC can have an Partition OR'd with Allocation TAG or with this Jira OR'd with Attributes. Impact of this would be that Cluster admin will not be able to plan resources for the partition per queue. And also i am not able to envisage the scenario where in partition needs to be OR'd with Allocation tags or Attributes. {quote}No, not necessarily. Just to keep the changes clean and incremental, we can allow this form for now. Because this is the spec we used for distributed shell. Since we don't have an --allocationTags argument. A "foo=3" is the only way to specify allocation tags right now. {quote} Agree, My bad but the reason i got confused is all the parsing logic for the Distributed shell's expression is present in "hadoop-yarn-api" project which more so implies that this the API to be used by other apps too. {quote}I want to reinforce the phase-target (for the merge), we want node-attributes can be integrated into PC and support simple operation "=" and "!=". {quote} Yes I concur with you on this, as long as we are able to clearly capture for others what kind of Java API needs to be used to specify Node attributes. [~sunilg], Could there be a simple test case or example which captures how to write a java api where in node attributes can be specified for a given SchedulingRequest i.e. without any of these expression DSL's ? Also i could not see a test case where in CS is handling scheduling of containers with PC having attributes. So that modifications in PlacementConstraintsUtil is tested. > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, > YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589116#comment-16589116 ] Weiwei Yang commented on YARN-7863: --- Hi [~Naganarasimha] Regarding some of your comments {quote}IIUC we can specify allocation tags, attribute expression and partition label all in single expression ? {quote} Yes, this is supported. User can specify constraints against allocation tags, node attributes and partition in a single expression (a conjunction expression composed by AND or OR). E.g place a container on a node has allocation tag X and javaVersion (node-attribute) = 1.8. {quote}If its the case then how are we going select the outstanding/ pending requests for a given queue for a given partition, as there could be OR with allocation tags or attribute expression right ? {quote} PC doesn't affect any logic how scheduler selects requests, it is still how it is handled now. A PC is simply checked twice when 1) creating an allocation proposal on a node and 2) at commit phase against a specific node. {quote}Allocation tags are created even if we want to specify attribute expression ? {quote} No, not necessarily. Just to keep the changes clean and incremental, we can allow this form for now. Because this is the spec we used for distributed shell. Since we don't have an --allocationTags argument. A "foo=3" is the only way to specify allocation tags right now. A follow-up task is to make that optional. But I am not sure why even a single tag without container number is supported, maybe [~sunilg] can comment more. I want to reinforce the phase-target (for the merge), we want node-attributes can be integrated into PC and support simple operation "=" and "!=". We want to extend DS to support node-attributes PC expressions for testing, but with minimal changes to the existing placement spec. For real users, their interface will be java API or native service spec, not this spec in DS. Hope that makes sense. > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, > YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589075#comment-16589075 ] Naganarasimha G R commented on YARN-7863: - Hi [~sunilg], Now i am confused when we have everything in one single expression . * IIUC we can specify allocation tags, attribute expression and partition label all in single expression ? * If its the case then how are we going select the outstanding/ pending requests for a given queue for a given partition, as there could be OR with allocation tags or attribute expression right ? I think this issue should be already existing when we have OR ing involved with allocation tags ? * Based on the test cases in TestPlacementConstraintParser.testParseNodeAttributeSpec, Allocation tags are created even if we want to specify attribute expression ? ex : "xyz,in,rm.yarn.io/foo=true" xyz is allocation tag ? If so users will be really confused what to pass and what not ! * if i specify "IN" and have attribute expression as "xyz,in,rm.yarn.io/foo{color:#d04437}*!=*{color}true" is it valid ? In future when we come with more operators it would not make sense. I would suggest to go with "," as separator. I am not sure how many cases would be there where we want to specify all the constraints in the same expression but we are making the users life complex with having such a complex DSL. > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, > YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service
[ https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589066#comment-16589066 ] Eric Yang commented on YARN-8675: - Further analysis of the problem indicated that the registry DNS is set to mycluster.com, while the host level domain is example.com. If the hostname is host1.example.com, spark on yarn cluster workload will start the container as host1.mycluster.com. Host1.mycluster.com is unresolvable because no registryDNS entry is written to zookeeper. Without using YARN service API, there is no AM logic that handles registration of hostname to IP mapping. This is the reason that it failed. For handling net=host situation properly without using YARN service API, the registry DNS must set to same as host level domain, which is example.com. System administrator must configure registryDNS domain properly to permit application to use host level domain. This is to ensure that decouple of infrastructure cluster (YARN), and workload cluster (YARN apps). The application does not try to impersonate infrastructure cluster unless explicitly allowed. This is a feature, not a bug. > Setting hostname of docker container breaks with "host" networking mode for > Apps which do not run as a YARN service > --- > > Key: YARN-8675 > URL: https://issues.apache.org/jira/browse/YARN-8675 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Major > Labels: Docker > > Applications like the Spark AM currently do not run as a YARN service and > setting hostname breaks driver/executor communication if docker version > >=1.13.1 , especially with wire-encryption turned on. > YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could > have a mix of YARN service/native Applications. > The proposal is to not set the hostname when "host" networking mode is > enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service
[ https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang resolved YARN-8675. - Resolution: Not A Problem > Setting hostname of docker container breaks with "host" networking mode for > Apps which do not run as a YARN service > --- > > Key: YARN-8675 > URL: https://issues.apache.org/jira/browse/YARN-8675 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Major > Labels: Docker > > Applications like the Spark AM currently do not run as a YARN service and > setting hostname breaks driver/executor communication if docker version > >=1.13.1 , especially with wire-encryption turned on. > YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could > have a mix of YARN service/native Applications. > The proposal is to not set the hostname when "host" networking mode is > enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service
[ https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588998#comment-16588998 ] Billie Rinaldi commented on YARN-8675: -- Perhaps we should always set the hostname when the AM has provided one through the YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_HOSTNAME env var, but place conditions on when the DockerLinuxContainerRuntime makes up a default hostname to set. We could remove the default hostname entirely, or just set it when net != host. Another probably ill-advised option would be to have the runtime populate the registry when the runtime sets a default hostname and RegistryDNS is enabled. But then we'd have to figure out a way to clean up the registry later. > Setting hostname of docker container breaks with "host" networking mode for > Apps which do not run as a YARN service > --- > > Key: YARN-8675 > URL: https://issues.apache.org/jira/browse/YARN-8675 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Major > Labels: Docker > > Applications like the Spark AM currently do not run as a YARN service and > setting hostname breaks driver/executor communication if docker version > >=1.13.1 , especially with wire-encryption turned on. > YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could > have a mix of YARN service/native Applications. > The proposal is to not set the hostname when "host" networking mode is > enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588996#comment-16588996 ] Weiwei Yang commented on YARN-7863: --- Hi [~sunilg] Thanks for the updates. I think v8 patch has addressed most of my concerns. Two comments: PlacementConstraintsUtil.java canSatisfyNodeConstraintExpresssion: looks like the logic only supports affinity to node-attributes, does it support anti-affinity? e.g {{targetNotIn(NODE, nodeAttribute("java", "1.8"))}}, can container with such PC be allocated to nodes where {{java != 1.8}}? If it is not straightforward to support this with {{targetIn}} and {{targetNotIn}}, I think we can add a API something like {{targetNodeAttribute(Operator.EQ, "java", "1.8")}} ? For this version, we can claim only to support {{EQ}} and {{NE}}. What do you think? TestPlacementConstraints.java line 72 and 85: why it is still {{nodeAttribute("java", "java=1.8")}}? The value should be "1.8" alone right? Thanks > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, > YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reassigned YARN-7644: - Assignee: Chandni Singh (was: Eric Badger) [~csingh], assigned to you. Thanks for picking this up > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS
[ https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588925#comment-16588925 ] Weiwei Yang edited comment on YARN-8670 at 8/22/18 2:36 PM: Hi [~Sichen Zhao] {quote}it seems all the improvements is depending on YARN-3409? {quote} Only if you want to specify node-attributes, that will depend on YARN-3409. Good news is that branch is close to merge, we are trying to get it done in 2 - 3 weeks. Simple PC with allocation tags is supported in trunk code. {quote}PC and allocataiontags shouldn't add to container level. {quote} The PC is specified in SchedulingRequest, so it is request level. It should be enough right? {quote}So maybe we need create a new task class. PC, allocataiontags and taskcontainer are members of task class {quote} Before going into the implementation details, I would love to know your idea how to specify PC in request level. Curren SLS supports to launch workload from SLS/Rumen/Synth traces. YARN-8007 was to support in Synth traces, so it was only able to test simple affinity/anti-affinity PCs. Therefore, could you please share the idea to support a large number of jobs/requests with PCs in traces? Thanks was (Author: cheersyang): Hi [~Sichen Zhao] {quote}it seems all the improvements is depending on YARN-3409? {quote} Only if you want to specify node-attributes, that will depend on YARN-3409. Good news is that branch is close to merge, we are trying to get it done in 2 - 3 weeks. {quote}PC and allocataiontags shouldn't add to container level. {quote} The PC is specified in SchedulingRequest, so it is request level. It should be enough right? {quote}So maybe we need create a new task class. PC, allocataiontags and taskcontainer are members of task class {quote} Before going into the implementation details, I would love to know your idea how to specify PC in request level. Curren SLS supports to launch workload from SLS/Rumen/Synth traces. YARN-8007 was to support in Synth traces, so it was only able to test simple affinity/anti-affinity PCs. Therefore, could you please share the idea to support a large number of jobs/requests with PCs in traces? Thanks > Support scheduling request for SLS input and attributes for Node in SLS > --- > > Key: YARN-8670 > URL: https://issues.apache.org/jira/browse/YARN-8670 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: YARN-3409 >Reporter: Sichen zhao >Priority: Major > Fix For: YARN-3409 > > > YARN-3409 introduces placement constraint, Currently SLS does not support > specify placement constraint. > YARN-8007 support specifying placement constraint for task containers in SLS. > But there are still > some room for improvement: > # YARN-8007 only support placement constraint for the jobs level. In fact, > the more flexible way is support placement constraint for the tasks level. > # In most scenarios, node itself has some characteristics, called attribute, > which is not supported in SLS. So we can add the attribute on Nodes. > # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create > a large number of specific resource requests. We wanna create a new > schedulingrequest input format(like sis format) for the more authentic input. > We can add some field in sis format, and this is schedulingrequest input > format. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS
[ https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588925#comment-16588925 ] Weiwei Yang commented on YARN-8670: --- Hi [~Sichen Zhao] {quote}it seems all the improvements is depending on YARN-3409? {quote} Only if you want to specify node-attributes, that will depend on YARN-3409. Good news is that branch is close to merge, we are trying to get it done in 2 - 3 weeks. {quote}PC and allocataiontags shouldn't add to container level. {quote} The PC is specified in SchedulingRequest, so it is request level. It should be enough right? {quote}So maybe we need create a new task class. PC, allocataiontags and taskcontainer are members of task class {quote} Before going into the implementation details, I would love to know your idea how to specify PC in request level. Curren SLS supports to launch workload from SLS/Rumen/Synth traces. YARN-8007 was to support in Synth traces, so it was only able to test simple affinity/anti-affinity PCs. Therefore, could you please share the idea to support a large number of jobs/requests with PCs in traces? Thanks > Support scheduling request for SLS input and attributes for Node in SLS > --- > > Key: YARN-8670 > URL: https://issues.apache.org/jira/browse/YARN-8670 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: YARN-3409 >Reporter: Sichen zhao >Priority: Major > Fix For: YARN-3409 > > > YARN-3409 introduces placement constraint, Currently SLS does not support > specify placement constraint. > YARN-8007 support specifying placement constraint for task containers in SLS. > But there are still > some room for improvement: > # YARN-8007 only support placement constraint for the jobs level. In fact, > the more flexible way is support placement constraint for the tasks level. > # In most scenarios, node itself has some characteristics, called attribute, > which is not supported in SLS. So we can add the attribute on Nodes. > # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create > a large number of specific resource requests. We wanna create a new > schedulingrequest input format(like sis format) for the more authentic input. > We can add some field in sis format, and this is schedulingrequest input > format. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS
[ https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8670: -- Target Version/s: (was: YARN-3409) > Support scheduling request for SLS input and attributes for Node in SLS > --- > > Key: YARN-8670 > URL: https://issues.apache.org/jira/browse/YARN-8670 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: YARN-3409 >Reporter: Sichen zhao >Priority: Major > Fix For: YARN-3409 > > > YARN-3409 introduces placement constraint, Currently SLS does not support > specify placement constraint. > YARN-8007 support specifying placement constraint for task containers in SLS. > But there are still > some room for improvement: > # YARN-8007 only support placement constraint for the jobs level. In fact, > the more flexible way is support placement constraint for the tasks level. > # In most scenarios, node itself has some characteristics, called attribute, > which is not supported in SLS. So we can add the attribute on Nodes. > # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create > a large number of specific resource requests. We wanna create a new > schedulingrequest input format(like sis format) for the more authentic input. > We can add some field in sis format, and this is schedulingrequest input > format. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8685) Add containers query support for nodes/node REST API in RMWebServices
[ https://issues.apache.org/jira/browse/YARN-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588896#comment-16588896 ] Weiwei Yang commented on YARN-8685: --- Hi [~Tao Yang] {quote}There is a ContainerInfo class in hadoop-yarn-server-common module, the patch can share this class with adding several fields like allocationRequestId/version/allocationTags {quote} Correct. How about my suggestion about adding a new endpoint in RM? Does that make sense to you? Thanks > Add containers query support for nodes/node REST API in RMWebServices > - > > Key: YARN-8685 > URL: https://issues.apache.org/jira/browse/YARN-8685 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8685.001.patch > > > Currently we can only query running containers from NM containers REST API, > but can't get the valid containers which are in ALLOCATED/ACQUIRED state. We > have the requirements to get all containers allocated on specified nodes for > debugging. I want to add a "includeContainers" query param (default false) > for nodes/node REST API in RMWebServices, so that we can get valid containers > on nodes if "includeContainers=true" specified. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588777#comment-16588777 ] Antal Bálint Steinbach commented on YARN-8468: -- Hi [~haibochen] , Thank you for your feedback. All of the points you suggested are fixed. > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588775#comment-16588775 ] genericqa commented on YARN-8649: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 44s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8649 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936636/YARN-8649_4.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1f7cbdde302c 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8184739 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21659/testReport/ | | Max. process+thread count | 335 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21659/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Similar as YARN-4355:NPE while proces
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588772#comment-16588772 ] Antal Bálint Steinbach commented on YARN-8468: -- The failing test is a flaky test: https://issues.apache.org/jira/browse/YARN-8433 > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588748#comment-16588748 ] genericqa commented on YARN-8468: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 11 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 11s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 31s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 41 new + 602 unchanged - 15 fixed = 643 total (was 617) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 46s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8468 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936607/YARN-8468.006.patch
[jira] [Updated] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8649: Attachment: YARN-8649_4.patch > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, > YARN-8649_4.patch, hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS
[ https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588645#comment-16588645 ] Sichen zhao edited comment on YARN-8670 at 8/22/18 10:07 AM: - Hi, sorry for reply late. it seems all the improvements is depending on YARN-3409? the master branch do not support scheduling request yet. And for the 1st improvment, what i thought was add PC and allocataiontags on taskcontainer , but i read YARN-8007, and PC and allocataiontags shouldn't add to container level. So maybe we need create a new task class. PC, allocataiontags and taskcontainer are members of task class. What do you think? was (Author: sichen zhao): Hi, sorry for reply late. it seems all the improvements is depending on YARN-3409? the master branch do not support scheduling request yet. > Support scheduling request for SLS input and attributes for Node in SLS > --- > > Key: YARN-8670 > URL: https://issues.apache.org/jira/browse/YARN-8670 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: YARN-3409 >Reporter: Sichen zhao >Priority: Major > Fix For: YARN-3409 > > > YARN-3409 introduces placement constraint, Currently SLS does not support > specify placement constraint. > YARN-8007 support specifying placement constraint for task containers in SLS. > But there are still > some room for improvement: > # YARN-8007 only support placement constraint for the jobs level. In fact, > the more flexible way is support placement constraint for the tasks level. > # In most scenarios, node itself has some characteristics, called attribute, > which is not supported in SLS. So we can add the attribute on Nodes. > # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create > a large number of specific resource requests. We wanna create a new > schedulingrequest input format(like sis format) for the more authentic input. > We can add some field in sis format, and this is schedulingrequest input > format. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS
[ https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588645#comment-16588645 ] Sichen zhao commented on YARN-8670: --- Sorry for reply late. it seems all the improvements is depending on YARN-3409? the master branch do not support scheduling request yet. > Support scheduling request for SLS input and attributes for Node in SLS > --- > > Key: YARN-8670 > URL: https://issues.apache.org/jira/browse/YARN-8670 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: YARN-3409 >Reporter: Sichen zhao >Priority: Major > Fix For: YARN-3409 > > > YARN-3409 introduces placement constraint, Currently SLS does not support > specify placement constraint. > YARN-8007 support specifying placement constraint for task containers in SLS. > But there are still > some room for improvement: > # YARN-8007 only support placement constraint for the jobs level. In fact, > the more flexible way is support placement constraint for the tasks level. > # In most scenarios, node itself has some characteristics, called attribute, > which is not supported in SLS. So we can add the attribute on Nodes. > # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create > a large number of specific resource requests. We wanna create a new > schedulingrequest input format(like sis format) for the more authentic input. > We can add some field in sis format, and this is schedulingrequest input > format. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS
[ https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588645#comment-16588645 ] Sichen zhao edited comment on YARN-8670 at 8/22/18 9:57 AM: Hi, sorry for reply late. it seems all the improvements is depending on YARN-3409? the master branch do not support scheduling request yet. was (Author: sichen zhao): Sorry for reply late. it seems all the improvements is depending on YARN-3409? the master branch do not support scheduling request yet. > Support scheduling request for SLS input and attributes for Node in SLS > --- > > Key: YARN-8670 > URL: https://issues.apache.org/jira/browse/YARN-8670 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: YARN-3409 >Reporter: Sichen zhao >Priority: Major > Fix For: YARN-3409 > > > YARN-3409 introduces placement constraint, Currently SLS does not support > specify placement constraint. > YARN-8007 support specifying placement constraint for task containers in SLS. > But there are still > some room for improvement: > # YARN-8007 only support placement constraint for the jobs level. In fact, > the more flexible way is support placement constraint for the tasks level. > # In most scenarios, node itself has some characteristics, called attribute, > which is not supported in SLS. So we can add the attribute on Nodes. > # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create > a large number of specific resource requests. We wanna create a new > schedulingrequest input format(like sis format) for the more authentic input. > We can add some field in sis format, and this is schedulingrequest input > format. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8468: - Attachment: YARN-8468.006.patch > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588477#comment-16588477 ] genericqa commented on YARN-8649: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 1s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8649 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936581/YARN-8649_3.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux da047436d45a 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8184739 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/21657/artifact/out/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21657/testReport/ | | Max. process+thread count | 440 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreComm