[jira] [Commented] (YARN-8685) Add containers query support for nodes/node REST API in RMWebServices
[ https://issues.apache.org/jira/browse/YARN-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588371#comment-16588371 ] Tao Yang commented on YARN-8685: [~cheersyang], Thanks for your suggestions. Make sense to me. There is a ContainerInfo class in hadoop-yarn-server-common module, the patch can share this class with adding several fields like allocationRequestId/version/allocationTags. Right? > Add containers query support for nodes/node REST API in RMWebServices > - > > Key: YARN-8685 > URL: https://issues.apache.org/jira/browse/YARN-8685 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8685.001.patch > > > Currently we can only query running containers from NM containers REST API, > but can't get the valid containers which are in ALLOCATED/ACQUIRED state. We > have the requirements to get all containers allocated on specified nodes for > debugging. I want to add a "includeContainers" query param (default false) > for nodes/node REST API in RMWebServices, so that we can get valid containers > on nodes if "includeContainers=true" specified. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8698) Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588372#comment-16588372 ] genericqa commented on YARN-8698: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 35s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine in trunk has 4 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 12s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 54m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8698 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936558/YARN-8698.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 01da4d8f9c08 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e557c6b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/21656/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine-warnings.html | | checkstyle |
[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588356#comment-16588356 ] genericqa commented on YARN-8649: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 2 new + 234 unchanged - 0 fixed = 236 total (was 234) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 30s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 72m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8649 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936555/YARN-8649_2.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1b2539cf8401 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e557c6b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21655/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21655/testReport/ | | Max. process+thread count | 302 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Comment Edited] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588326#comment-16588326 ] potato edited comment on YARN-8649 at 8/22/18 3:20 AM: --- Shutting down many nodes gracefully can cause massive NPEs, this will disturb our log analysis tool! Fixing this issue may can help us reduce the NPEs. was (Author: potato): Gracefully shutting down many nodes can cause massive NPEs, this will disturb our log analysis tool! > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-8649.patch, YARN-8649_2.patch, > hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588326#comment-16588326 ] potato commented on YARN-8649: -- Gracefully shutting down many nodes can cause massive NPEs, this will disturb our log analysis tool! > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-8649.patch, YARN-8649_2.patch, > hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588306#comment-16588306 ] lujie commented on YARN-8649: - Hi [~jlowe]: # In the new patch, I let the "getPathForLocalization" return null if "rsrc == null"(there are also a log statement to indicate why return null). # In the "processHeartbeat" and "addResource" who use the return value of "getPathForLocalization" , I add the null check. The null checker can prevent the unnecessary download! > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-8649.patch, YARN-8649_2.patch, > hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8649: Attachment: YARN-8649_2.patch > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-8649.patch, YARN-8649_2.patch, > hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8015) Complete placement constraint support for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588279#comment-16588279 ] Weiwei Yang commented on YARN-8015: --- Hi [~sunilg], thanks {quote}I think we can commit this only to trunk alone {quote} I am OK with that. We can have the full support of PC and node-attributes in 3.2 release line. > Complete placement constraint support for Capacity Scheduler > > > Key: YARN-8015 > URL: https://issues.apache.org/jira/browse/YARN-8015 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: YARN-8015.001.patch, YARN-8015.002.patch, > YARN-8015.003.patch, YARN-8015.004.patch > > > AppPlacementAllocator currently only supports intra-app anti-affinity > placement constraints, once YARN-8002 and YARN-8013 are resolved, it needs to > support inter-app constraints too. Also, this may require some refactoring on > the existing code logic. Use this JIRA to track. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8015) Complete placement constraint support for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588274#comment-16588274 ] Sunil Govindan commented on YARN-8015: -- Thanks [~cheersyang]. Looks fine to me. Committing shortly. I think we can commit this only to trunk alone, correct? > Complete placement constraint support for Capacity Scheduler > > > Key: YARN-8015 > URL: https://issues.apache.org/jira/browse/YARN-8015 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: YARN-8015.001.patch, YARN-8015.002.patch, > YARN-8015.003.patch, YARN-8015.004.patch > > > AppPlacementAllocator currently only supports intra-app anti-affinity > placement constraints, once YARN-8002 and YARN-8013 are resolved, it needs to > support inter-app constraints too. Also, this may require some refactoring on > the existing code logic. Use this JIRA to track. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8698) Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zac Zhou updated YARN-8698: --- Description: When a standalone submarine tf job is submitted, the following error is got : INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa me=(NULL)) error: (unable to get root cause for java.lang.NoClassDefFoundError) (unable to get stack trace for java.lang.NoClassDefFoundError) hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa me=(NULL)) error: (unable to get root cause for java.lang.NoClassDefFoundError) (unable to get stack trace for java.lang.NoClassDefFoundError) This error may be related to hadoop classpath Hadoop env variables of launch_container.sh are as follows: export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} run-PRIMARY_WORKER.sh is like: export HADOOP_YARN_HOME= export HADOOP_HDFS_HOME=/hadoop-3.1.0 export HADOOP_CONF_DIR=$WORK_DIR was: when a standalone submarine tf job is submitted, the following error is got : INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa me=(NULL)) error: (unable to get root cause for java.lang.NoClassDefFoundError) (unable to get stack trace for java.lang.NoClassDefFoundError) hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa me=(NULL)) error: (unable to get root cause for java.lang.NoClassDefFoundError) (unable to get stack trace for java.lang.NoClassDefFoundError) This error may be related to hadoop classpath Hadoop env variables of launch_container.sh are as follows: export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} run-PRIMARY_WORKER.sh is like: export HADOOP_YARN_HOME= export HADOOP_HDFS_HOME=/hadoop-3.1.0 export HADOOP_CONF_DIR=$WORK_DIR > Failed to add hadoop dependencies in docker container when submitting a > submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zac Zhou >Priority: Major > > When a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Updated] (YARN-8698) Failed to add hadoop dependencies in docker container when submitting a submarine job
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zac Zhou updated YARN-8698: --- Description: when a standalone submarine tf job is submitted, the following error is got : INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa me=(NULL)) error: (unable to get root cause for java.lang.NoClassDefFoundError) (unable to get stack trace for java.lang.NoClassDefFoundError) hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa me=(NULL)) error: (unable to get root cause for java.lang.NoClassDefFoundError) (unable to get stack trace for java.lang.NoClassDefFoundError) This error may be related to hadoop classpath Hadoop env variables of launch_container.sh are as follows: export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} run-PRIMARY_WORKER.sh is like: export HADOOP_YARN_HOME= export HADOOP_HDFS_HOME=/hadoop-3.1.0 export HADOOP_CONF_DIR=$WORK_DIR was: when a standalone submarine tf job is submitted, the following error was got : INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa me=(NULL)) error: (unable to get root cause for java.lang.NoClassDefFoundError) (unable to get stack trace for java.lang.NoClassDefFoundError) hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa me=(NULL)) error: (unable to get root cause for java.lang.NoClassDefFoundError) (unable to get stack trace for java.lang.NoClassDefFoundError) This error may be related to hadoop classpath Hadoop env variables of launch_container.sh are as follows: export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} run-PRIMARY_WORKER.sh is like: export HADOOP_YARN_HOME= export HADOOP_HDFS_HOME=/hadoop-3.1.0 export HADOOP_CONF_DIR=$WORK_DIR > Failed to add hadoop dependencies in docker container when submitting a > submarine job > - > > Key: YARN-8698 > URL: https://issues.apache.org/jira/browse/YARN-8698 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zac Zhou >Priority: Major > > when a standalone submarine tf job is submitted, the following error is got : > INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) > INFO:tensorflow:Done calling model_fn. > INFO:tensorflow:Create CheckpointSaverHook. > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, > kerbTicketCachePath=(NULL), userNa > me=(NULL)) error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError) > > This error may be related to hadoop classpath > Hadoop env variables of launch_container.sh are as follows: > export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} > export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} > export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} > > run-PRIMARY_WORKER.sh is like: > export HADOOP_YARN_HOME= > export HADOOP_HDFS_HOME=/hadoop-3.1.0 > export HADOOP_CONF_DIR=$WORK_DIR > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8698) Failed to add hadoop dependencies in docker container when submitting a submarine job
Zac Zhou created YARN-8698: -- Summary: Failed to add hadoop dependencies in docker container when submitting a submarine job Key: YARN-8698 URL: https://issues.apache.org/jira/browse/YARN-8698 Project: Hadoop YARN Issue Type: Bug Reporter: Zac Zhou when a standalone submarine tf job is submitted, the following error was got : INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11) INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa me=(NULL)) error: (unable to get root cause for java.lang.NoClassDefFoundError) (unable to get stack trace for java.lang.NoClassDefFoundError) hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, kerbTicketCachePath=(NULL), userNa me=(NULL)) error: (unable to get root cause for java.lang.NoClassDefFoundError) (unable to get stack trace for java.lang.NoClassDefFoundError) This error may be related to hadoop classpath Hadoop env variables of launch_container.sh are as follows: export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"} export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"} export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"} run-PRIMARY_WORKER.sh is like: export HADOOP_YARN_HOME= export HADOOP_HDFS_HOME=/hadoop-3.1.0 export HADOOP_CONF_DIR=$WORK_DIR -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588238#comment-16588238 ] Chen Yufei commented on YARN-8513: -- [~leftnoteasy] My original config did not have the two config options specified so should be using the default values. Currently I have applied the configuration suggested by [~cheersyang], so maximum-container-assignments is 10 now. > CapacityScheduler infinite loop when queue is near fully utilized > - > > Key: YARN-8513 > URL: https://issues.apache.org/jira/browse/YARN-8513 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 3.1.0, 2.9.1 > Environment: Ubuntu 14.04.5 and 16.04.4 > YARN is configured with one label and 5 queues. >Reporter: Chen Yufei >Priority: Major > Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, > jstack-5.log, top-during-lock.log, top-when-normal.log, yarn3-jstack1.log, > yarn3-jstack2.log, yarn3-jstack3.log, yarn3-jstack4.log, yarn3-jstack5.log, > yarn3-resourcemanager.log, yarn3-top > > > ResourceManager does not respond to any request when queue is near fully > utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM > restart, it can recover running jobs and start accepting new ones. > > Seems like CapacityScheduler is in an infinite loop printing out the > following log messages (more than 25,000 lines in a second): > > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.99816763 > absoluteUsedCapacity=0.99816763 used= > cluster=}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application attempt=appattempt_1530619767030_1652_01 > container=null > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 > clusterResource= type=NODE_LOCAL > requestedPartition=}} > > I encounter this problem several times after upgrading to YARN 2.9.1, while > the same configuration works fine under version 2.7.3. > > YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a > similar problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8696) FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588205#comment-16588205 ] genericqa commented on YARN-8696: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 57s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 55s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 1s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 16s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 5 new + 225 unchanged - 0 fixed = 230 total (was 225) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 22s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 48s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 43s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 16s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 26s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 49s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 42s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}210m 12s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce
[jira] [Updated] (YARN-8697) LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when cannot resolve resource
[ https://issues.apache.org/jira/browse/YARN-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8697: --- Issue Type: Sub-task (was: Task) Parent: YARN-5597 > LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when > cannot resolve resource > --- > > Key: YARN-8697 > URL: https://issues.apache.org/jira/browse/YARN-8697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > > Right now in LocalityMulticastAMRMProxyPolicy, whenever we cannot resolve the > resource name (node or rack), we always route the request to home > sub-cluster. However, home sub-cluster might not be always be ready to use > (timed out YARN-8581) or enabled (by AMRMProxyPolicy weights). It might also > be overwhelmed by the requests if sub-cluster resolver has some issue. In > this Jira, we are changing it to pick a random active and enabled sub-cluster > for resource request we cannot resolve. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8697) LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when cannot resolve resource
Botong Huang created YARN-8697: -- Summary: LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when cannot resolve resource Key: YARN-8697 URL: https://issues.apache.org/jira/browse/YARN-8697 Project: Hadoop YARN Issue Type: Task Reporter: Botong Huang Assignee: Botong Huang Right now in LocalityMulticastAMRMProxyPolicy, whenever we cannot resolve the resource name (node or rack), we always route the request to home sub-cluster. However, home sub-cluster might not be always be ready to use (timed out YARN-8581) or enabled (by AMRMProxyPolicy weights). It might also be overwhelmed by the requests if sub-cluster resolver has some issue. In this Jira, we are changing it to pick a random active and enabled sub-cluster for resource request we cannot resolve. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8298) Yarn Service Upgrade: Support express upgrade of a service
[ https://issues.apache.org/jira/browse/YARN-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588184#comment-16588184 ] Chandni Singh commented on YARN-8298: - Thanks [~eyang] > Yarn Service Upgrade: Support express upgrade of a service > -- > > Key: YARN-8298 > URL: https://issues.apache.org/jira/browse/YARN-8298 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8298.001.patch, YARN-8298.002.patch, > YARN-8298.003.patch, YARN-8298.004.patch, YARN-8298.005.patch, > YARN-8298.006.patch > > > Currently service upgrade involves 2 steps > * initiate upgrade by providing new spec > * trigger upgrade of each instance/component > > We need to add the ability to upgrade the service in one shot: > # Aborting the upgrade will not be supported > # Upgrade finalization will be done automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8298) Yarn Service Upgrade: Support express upgrade of a service
[ https://issues.apache.org/jira/browse/YARN-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588181#comment-16588181 ] Hudson commented on YARN-8298: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14812 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14812/]) YARN-8298. Added express upgrade for YARN service. (eyang: rev e557c6bd8de2811a561210f672f47b4d07a9d5c6) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/ComponentEvent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceEvent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/api/records/ServiceState.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/src/main/java/org/apache/hadoop/yarn/service/client/ApiServiceClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestYarnNativeServices.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/src/main/java/org/apache/hadoop/yarn/service/webapp/ApiServer.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/client/ServiceClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstance.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/utils/TestServiceApiUtil.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/Component.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestServiceManager.java * (delete) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestServiceApiUtil.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/proto/ClientAMProtocol.proto * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/AppAdminClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/utils/ServiceApiUtil.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ClientAMService.java > Yarn Service Upgrade: Support express upgrade of a service > -- > > Key: YARN-8298 > URL: https://issues.apache.org/jira/browse/YARN-8298 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8298.001.patch, YARN-8298.002.patch, > YARN-8298.003.patch, YARN-8298.004.patch, YARN-8298.005.patch, > YARN-8298.006.patch > > > Currently service upgrade involves 2 steps > * initiate upgrade by providing new spec > * trigger upgrade of each instance/component > > We need to add the ability to upgrade the service in one shot: > # Aborting the upgrade will not be supported > # Upgrade finalization will be done automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (YARN-8298) Yarn Service Upgrade: Support express upgrade of a service
[ https://issues.apache.org/jira/browse/YARN-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588169#comment-16588169 ] Eric Yang commented on YARN-8298: - +1 Patch 6 looks good to me. > Yarn Service Upgrade: Support express upgrade of a service > -- > > Key: YARN-8298 > URL: https://issues.apache.org/jira/browse/YARN-8298 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8298.001.patch, YARN-8298.002.patch, > YARN-8298.003.patch, YARN-8298.004.patch, YARN-8298.005.patch, > YARN-8298.006.patch > > > Currently service upgrade involves 2 steps > * initiate upgrade by providing new spec > * trigger upgrade of each instance/component > > We need to add the ability to upgrade the service in one shot: > # Aborting the upgrade will not be supported > # Upgrade finalization will be done automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588153#comment-16588153 ] genericqa commented on YARN-8509: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 10 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 13 new + 1374 unchanged - 5 fixed = 1387 total (was 1379) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m 58s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}123m 7s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8509 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936517/YARN-8509.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 62bb3eacc75c 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9c3fc3e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21653/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21653/testReport/ | | Max. process+thread count | 866 (vs. ulimit of 1) | | modules | C:
[jira] [Commented] (YARN-8298) Yarn Service Upgrade: Support express upgrade of a service
[ https://issues.apache.org/jira/browse/YARN-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588152#comment-16588152 ] genericqa commented on YARN-8298: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 13s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 15s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 9 new + 387 unchanged - 2 fixed = 396 total (was 389) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 23s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 3s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 43s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}119m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8298 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936518/YARN-8298.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient
[jira] [Assigned] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh reassigned YARN-8672: --- Assignee: Chandni Singh > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Jason Lowe >Assignee: Chandni Singh >Priority: Major > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588083#comment-16588083 ] Chandni Singh commented on YARN-7644: - [~ebadger] I would like to work on this issue. Please re-assign to me if you are not working on it. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7644: Parent Issue: YARN-8472 (was: YARN-3611) > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service
[ https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8675: Labels: Docker (was: ) > Setting hostname of docker container breaks with "host" networking mode for > Apps which do not run as a YARN service > --- > > Key: YARN-8675 > URL: https://issues.apache.org/jira/browse/YARN-8675 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Major > Labels: Docker > > Applications like the Spark AM currently do not run as a YARN service and > setting hostname breaks driver/executor communication if docker version > >=1.13.1 , especially with wire-encryption turned on. > YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could > have a mix of YARN service/native Applications. > The proposal is to not set the hostname when "host" networking mode is > enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588066#comment-16588066 ] genericqa commented on YARN-7863: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} YARN-3409 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 24s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 24s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 15s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 33s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s{color} | {color:green} YARN-3409 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 0s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 27s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 34s{color} | {color:red} hadoop-yarn-applications-distributedshell in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 57s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}199m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMHA | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-7863 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936498/YARN-7863-YARN-3409.008.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4401f7177d81 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Updated] (YARN-8696) FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8696: --- Attachment: YARN-8696.v1.patch > FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8696.v1.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8696) FederationInterceptor upgrade: home sub-cluster heartbeat async
Botong Huang created YARN-8696: -- Summary: FederationInterceptor upgrade: home sub-cluster heartbeat async Key: YARN-8696 URL: https://issues.apache.org/jira/browse/YARN-8696 Project: Hadoop YARN Issue Type: Task Reporter: Botong Huang Assignee: Botong Huang Today in _FederationInterceptor_, the heartbeat to home sub-cluster is synchronous. After the heartbeat is sent out to home sub-cluster, it waits for the home response to come back before merging and returning the (merged) heartbeat result to back AM. If home sub-cluster is suffering from connection issues, or down during an YarnRM master-slave switch, all heartbeat threads in _FederationInterceptor_ will be blocked waiting for home response. As a result, the successful UAM heartbeats from secondary sub-clusters will not be returned to AM at all. Additionally, because of the fact that we kept the same heartbeat responseId between AM and home RM, lots of tricky handling are needed regarding the responseId resync when it comes to _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart (YARN-6127, YARN-1336), home RM master-slave switch etc. In this patch, we change the heartbeat to home sub-cluster to asynchronous, same as the way we handle UAM heartbeats in secondaries. So that any sub-cluster down or connection issues won't impact AM getting responses from other sub-clusters. The responseId is also managed separately for home sub-cluster and AM, and they increment independently. The resync logic becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8696) FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8696: --- Issue Type: Sub-task (was: Task) Parent: YARN-5597 > FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8298) Yarn Service Upgrade: Support express upgrade of a service
[ https://issues.apache.org/jira/browse/YARN-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588018#comment-16588018 ] Chandni Singh commented on YARN-8298: - [~eyang] patch 6 includes the change for upgrading component by component. However, if a component upgrade fails, manual intervention is required. > Yarn Service Upgrade: Support express upgrade of a service > -- > > Key: YARN-8298 > URL: https://issues.apache.org/jira/browse/YARN-8298 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8298.001.patch, YARN-8298.002.patch, > YARN-8298.003.patch, YARN-8298.004.patch, YARN-8298.005.patch, > YARN-8298.006.patch > > > Currently service upgrade involves 2 steps > * initiate upgrade by providing new spec > * trigger upgrade of each instance/component > > We need to add the ability to upgrade the service in one shot: > # Aborting the upgrade will not be supported > # Upgrade finalization will be done automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8298) Yarn Service Upgrade: Support express upgrade of a service
[ https://issues.apache.org/jira/browse/YARN-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8298: Attachment: YARN-8298.006.patch > Yarn Service Upgrade: Support express upgrade of a service > -- > > Key: YARN-8298 > URL: https://issues.apache.org/jira/browse/YARN-8298 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8298.001.patch, YARN-8298.002.patch, > YARN-8298.003.patch, YARN-8298.004.patch, YARN-8298.005.patch, > YARN-8298.006.patch > > > Currently service upgrade involves 2 steps > * initiate upgrade by providing new spec > * trigger upgrade of each instance/component > > We need to add the ability to upgrade the service in one shot: > # Aborting the upgrade will not be supported > # Upgrade finalization will be done automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588010#comment-16588010 ] Zian Chen commented on YARN-8509: - Fix failed UTs and re-upload the patch > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: capacityscheduler > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8509: Attachment: YARN-8509.005.patch > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: capacityscheduler > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8581) [AMRMProxy] Add sub-cluster timeout in LocalityMulticastAMRMProxyPolicy
[ https://issues.apache.org/jira/browse/YARN-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588006#comment-16588006 ] Botong Huang commented on YARN-8581: Thanks [~giovanni.fumarola] for the review! > [AMRMProxy] Add sub-cluster timeout in LocalityMulticastAMRMProxyPolicy > --- > > Key: YARN-8581 > URL: https://issues.apache.org/jira/browse/YARN-8581 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, federation >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8581-branch-2.v2.patch, YARN-8581.v1.patch, > YARN-8581.v2.patch > > > In Federation, every time an AM heartbeat comes in, > LocalityMulticastAMRMProxyPolicy in AMRMProxy splits the asks according to > the list of active and enabled sub-clusters. However, if we haven't been able > to heartbeat to a sub-cluster for some time (network issues, or we keep > hitting some exception from YarnRM, or YarnRM master-slave switch is taking a > long time etc.), we should consider the sub-cluster as unhealthy and stop > routing asks there, until the heartbeat channel becomes healthy again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8673) [AMRMProxy] More robust responseId resync after an YarnRM master slave switch
[ https://issues.apache.org/jira/browse/YARN-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588003#comment-16588003 ] Botong Huang commented on YARN-8673: Thanks [~giovanni.fumarola]! > [AMRMProxy] More robust responseId resync after an YarnRM master slave switch > - > > Key: YARN-8673 > URL: https://issues.apache.org/jira/browse/YARN-8673 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8673-branch-2.v2.patch, YARN-8673.v1.patch, > YARN-8673.v2.patch > > > After master slave switch of YarnRM, an _ApplicationNotRegisteredException_ > will be thrown from the new YarnRM. AM will re-regsiter and reset the > responseId to zero. _AMRMClientRelayer_ inside _FederationInterceptor_ > follows the same protocol, and does the automatic re-register and responseId > resync. However, when exceptions or temporary network issue happens in the > allocate call after re-register, the resync logic might be broken. This patch > improves the robustness of the process by parsing the expected repsonseId > from YarnRM exception message. So that whenever the responseId is out of sync > for whatever reason, we can automatically resync and move on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8673) [AMRMProxy] More robust responseId resync after an YarnRM master slave switch
[ https://issues.apache.org/jira/browse/YARN-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587946#comment-16587946 ] Giovanni Matteo Fumarola commented on YARN-8673: Committed to branch-2 as well. > [AMRMProxy] More robust responseId resync after an YarnRM master slave switch > - > > Key: YARN-8673 > URL: https://issues.apache.org/jira/browse/YARN-8673 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8673-branch-2.v2.patch, YARN-8673.v1.patch, > YARN-8673.v2.patch > > > After master slave switch of YarnRM, an _ApplicationNotRegisteredException_ > will be thrown from the new YarnRM. AM will re-regsiter and reset the > responseId to zero. _AMRMClientRelayer_ inside _FederationInterceptor_ > follows the same protocol, and does the automatic re-register and responseId > resync. However, when exceptions or temporary network issue happens in the > allocate call after re-register, the resync logic might be broken. This patch > improves the robustness of the process by parsing the expected repsonseId > from YarnRM exception message. So that whenever the responseId is out of sync > for whatever reason, we can automatically resync and move on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587944#comment-16587944 ] Jason Lowe commented on YARN-8649: -- Thanks for the analysis and patch, [~xiaoheipangzi]! Is ignoring the null the right thing to do here? This in the middle of trying to find a path to localize a resource, and if the NM doesn't know about the resource then it seems inappropriate to go ahead and find a local path to put the resource and let the localizer go ahead and download it. That will be a waste of network and disk resources at best or an outright leak of the disk space at worst if its not cleaned up when the localizer finishes the download and reports completion on a resource the NM doesn't know about. > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-8649.patch, hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8581) [AMRMProxy] Add sub-cluster timeout in LocalityMulticastAMRMProxyPolicy
[ https://issues.apache.org/jira/browse/YARN-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586549#comment-16586549 ] Giovanni Matteo Fumarola edited comment on YARN-8581 at 8/21/18 8:07 PM: - Thanks [~botong] . Committed to trunk and branch-2. was (Author: giovanni.fumarola): Thanks [~botong] . Committed to trunk. > [AMRMProxy] Add sub-cluster timeout in LocalityMulticastAMRMProxyPolicy > --- > > Key: YARN-8581 > URL: https://issues.apache.org/jira/browse/YARN-8581 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, federation >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8581-branch-2.v2.patch, YARN-8581.v1.patch, > YARN-8581.v2.patch > > > In Federation, every time an AM heartbeat comes in, > LocalityMulticastAMRMProxyPolicy in AMRMProxy splits the asks according to > the list of active and enabled sub-clusters. However, if we haven't been able > to heartbeat to a sub-cluster for some time (network issues, or we keep > hitting some exception from YarnRM, or YarnRM master-slave switch is taking a > long time etc.), we should consider the sub-cluster as unhealthy and stop > routing asks there, until the heartbeat channel becomes healthy again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587903#comment-16587903 ] genericqa commented on YARN-8468: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 10 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 14s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 41 new + 558 unchanged - 15 fixed = 599 total (was 573) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 28s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 36s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}140m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMService | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8468 | | JIRA Patch URL |
[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service
[ https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587873#comment-16587873 ] Eric Yang commented on YARN-8675: - Docker only prevents --hostname and --net=host flag to be combined in older version of docker (1.13) . See [Docker issue|https://github.com/moby/moby/pull/29144]. The following table illustrate the possible combination when we should and should not support customized hostname: | YARN Registry DNS | YARN Service | Custom AM | Network Type | Custom Hostname | | Enabled | Yes | No | Host | Yes | | Enabled | No | Yes | Host | No | | Disabled | No | Yes | Host | No | | Disabled | No | Yes | Bridge | N/A | Today registryDNS and YARN service are coupled together, only YARN service knows how to populate hostname information to registryDNS. If custom AM creates it's own logic to generate custom hostname, it must have some way to populate RegistryDNS to translate correct name. Without using YARN service, there is no programmable API to customize hostname. This is the reason that Spark on YARN cluster on docker mode fails with buzzard hostname composition. For resolving spark issue, it is entirely possible to run spark using YARN service API without making any code changes to spark standalone mode. To fix this issue properly, it would be best to provide hint to docker runtime to decide if custom hostname can be supported. Custom hostname is a new concept that doesn't exist prior to docker container. Therefore, only new application using YARN service should be supported in my view. > Setting hostname of docker container breaks with "host" networking mode for > Apps which do not run as a YARN service > --- > > Key: YARN-8675 > URL: https://issues.apache.org/jira/browse/YARN-8675 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Major > > Applications like the Spark AM currently do not run as a YARN service and > setting hostname breaks driver/executor communication if docker version > >=1.13.1 , especially with wire-encryption turned on. > YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could > have a mix of YARN service/native Applications. > The proposal is to not set the hostname when "host" networking mode is > enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-7863: - Attachment: YARN-7863-YARN-3409.008.patch > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, > YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587849#comment-16587849 ] Sunil Govindan commented on YARN-7863: -- Updated v8 patch. cc [~cheersyang] [~Naganarasimha] > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, > YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service
[ https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8675: - Reporter: Yesha Vora (was: Suma Shivaprasad) > Setting hostname of docker container breaks with "host" networking mode for > Apps which do not run as a YARN service > --- > > Key: YARN-8675 > URL: https://issues.apache.org/jira/browse/YARN-8675 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Major > > Applications like the Spark AM currently do not run as a YARN service and > setting hostname breaks driver/executor communication if docker version > >=1.13.1 , especially with wire-encryption turned on. > YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could > have a mix of YARN service/native Applications. > The proposal is to not set the hostname when "host" networking mode is > enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8572) YarnClient getContainers API should support filtering by container status
[ https://issues.apache.org/jira/browse/YARN-8572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-8572: --- Assignee: Abhishek Modi > YarnClient getContainers API should support filtering by container status > - > > Key: YARN-8572 > URL: https://issues.apache.org/jira/browse/YARN-8572 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Suma Shivaprasad >Assignee: Abhishek Modi >Priority: Major > > YarnClient.getContainers should support filtering containers by their status > - RUNNING, COMPLETED etc . This may require corresponding changes in ATS to > filter by container status for a given application attempt -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8663) Opportunistic Container property "mapreduce.job.num-opportunistic-maps-percent" is throwing wrong exception at wrong sequence
[ https://issues.apache.org/jira/browse/YARN-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-8663: --- Assignee: Abhishek Modi > Opportunistic Container property > "mapreduce.job.num-opportunistic-maps-percent" is throwing wrong exception at > wrong sequence > - > > Key: YARN-8663 > URL: https://issues.apache.org/jira/browse/YARN-8663 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.1.1 > Environment: Secure Installation with Kerberos ON. >Reporter: Akshay Agarwal >Assignee: Abhishek Modi >Priority: Major > > Pre-requisites: > {code:java} > 1. Install HA cluster. > 2.Set yarn.nodemanager.opportunistic-containers-max-queue-length=(positive > integer value)[NodeManager->yarnsite.xml] > 3. Set yarn.resourcemanager.opportunistic-container-allocation.enabled= > true[ResourceManager->yarnsite.xml] > {code} > > Steps to reproduce: > {code:java} > 1.Keep All NodeManagers Up > 2. Submit a job with -Dmapreduce.job.num-opportunistic-maps-percent="abh" or > "2.5" > {code} > Expected Result: > {code:java} > Should through an Exception stating "NumberFormatException" before writing > the input for mappers. > {code} > Log Details: > {code:java} > 2018-08-14 18:15:54,049 INFO mapreduce.Job: map 0% reduce 0% > 2018-08-14 18:15:54,069 INFO mapreduce.Job: Job job_1534236847054_0005 failed > with state FAILED due to: Application application_1534236847054_0005 failed 2 > times due to AM Container for appattempt_1534236847054_0005_02 exited > with exitCode: 1 > Failing this attempt.Diagnostics: [2018-08-14 18:15:53.110]Exception from > container-launch. > Container id: container_e31_1534236847054_0005_02_01 > Exit code: 1 > [2018-08-14 18:15:53.113]Container exited with a non-zero exit code 1. Error > file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > Last 4096 bytes of stderr : > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; > support was removed in 8.0 > Aug 14, 2018 6:15:51 PM > com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register > INFO: Registering > org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider > class > Aug 14, 2018 6:15:51 PM > com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register > INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a > provider class > Aug 14, 2018 6:15:51 PM > com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register > INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as > a root resource class > Aug 14, 2018 6:15:51 PM > com.sun.jersey.server.impl.application.WebApplicationImpl _initiate > INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 > AM' > Aug 14, 2018 6:15:51 PM > com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory > getComponentProvider > INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver > to GuiceManagedComponentProvider with the scope "Singleton" > Aug 14, 2018 6:15:52 PM > com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory > getComponentProvider > INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to > GuiceManagedComponentProvider with the scope "Singleton" > Aug 14, 2018 6:15:52 PM > com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory > getComponentProvider > INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to > GuiceManagedComponentProvider with the scope "PerRequest" > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587715#comment-16587715 ] genericqa commented on YARN-8649: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 103 unchanged - 0 fixed = 104 total (was 103) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 58s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8649 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936466/YARN-8649.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 70258d99e914 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9c3fc3e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21649/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21649/testReport/ | | Max. process+thread count | 439 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Updated] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8468: - Attachment: YARN-8468.005.patch > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587681#comment-16587681 ] Eric Badger commented on YARN-8648: --- IMO I like the idea of actually dealing with this problem via proposal 1, but it seems like a much bigger effort that has many corner cases and realistically is going to require a whole new resource module controller. Therefore, I think we shouldn't let perfect get in the way of good and move forward with the more simple approach of removing the cgroups via the container-executor. I don't _like_ this solution, but I think it is a stopgap until we are able to really fix the underlying issues. > Container cgroups are leaked when using docker > -- > > Key: YARN-8648 > URL: https://issues.apache.org/jira/browse/YARN-8648 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: Docker > > When you run with docker and enable cgroups for cpu, docker creates cgroups > for all resources on the system, not just for cpu. For instance, if the > {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, > the nodemanager will create a cgroup for each container under > {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path > via the {{--cgroup-parent}} command line argument. Docker then creates a > cgroup for the docker container under that, for instance: > {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}. > When the container exits, docker cleans up the {{docker_container_id}} > cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is > good under {{/sys/fs/cgroup/hadoop-yarn}}. > The problem is that docker also creates that same hierarchy under every > resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these > are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, > perf_event, and systemd.So for instance, docker creates > {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but > it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up > the {{container_id}} cgroups for these other resources. On one of our busy > clusters, we found > 100,000 of these leaked cgroups. > I found this in our 2.8-based version of hadoop, but I have been able to > repro with current hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587634#comment-16587634 ] genericqa commented on YARN-7863: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} YARN-3409 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 54s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 13s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 13s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 38s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s{color} | {color:green} YARN-3409 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 52s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 24s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 53s{color} | {color:red} hadoop-yarn-applications-distributedshell in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}180m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.api.resource.TestPlacementConstraintParser | | | hadoop.yarn.server.resourcemanager.TestRMHA | | | hadoop.yarn.applications.distributedshell.TestDistributedShell | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-7863 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936438/YARN-7863-YARN-3409.007.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux da294ff3ca12 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018
[jira] [Assigned] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie reassigned YARN-8649: --- Assignee: lujie Attachment: YARN-8649.patch Hi [~jlowe], [~pradeepambati],[~$iddhe$h] I have restudied the bug according the logs. *The root cause:* # When NM shutdowns, it will sent KILL_CONTAINER to the Container, The log has shown this event: {code:java} 2018-08-21 20:11:08,316 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1534853453424_0001_01_01 transitioned from LOCALIZING to KILLING {code} this will led the KillBeforeRunningTransition to execute. # In KillBeforeRunningTransition, it will call "container.cleanup()", and in "cleanup" function, it will sent "ContainerLocalizationCleanupEvent". # ContainerLocalizationCleanupEvent will cause the ResourceLocalizationService.handleCleanupContainerResources to execute, and in "handleCleanupContainerResources", it will send "ResourceReleaseEvent". # ResourceReleaseEvent will led cause the LocalResourcesTrackerImpl.handle to execute, and in handle(at line 199in source code) it will call removeResouce: {code:java} if (event.getType() == ResourceEventType.RELEASE) { if (rsrc.getState() == ResourceState.DOWNLOADING && rsrc.getRefCount() <= 0 && rsrc.getRequest().getVisibility() != LocalResourceVisibility.PUBLIC) { removeResource(req); } } {code} # in removeResouce, it will do: {code:java} LocalizedResource rsrc = localrsrc.remove(req); {code} # when heartbeat come in, the LocalResourcesTrackerImpl.getPathForLocalization will do: {code:java} Path localPath = new Path(rPath, req.getPath().getName()); LocalizedResource rsrc = localrsrc.get(req);//rsec is null rsrc.setLocalPath(localPath);//NPE {code} NPE happens! *Unit test:* While fixing YARN-4355, the patch added the test "testLocalizerHeartbeatWhenAppCleaningUp" in Class "TestResourceLocalizationService" In the test, it also send the "ContainerLocalizationCleanupEvent", but the test doesn't cover that heartbeat can comes at this moment. In this patch, we change the "testLocalizerHeartbeatWhenAppCleaningUp" to cover this situation. This change will trigger the bug. Fixing: When we fix the NPE, we only add null check, i think it is suitable here! > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-8649.patch, hadoop-hires-nodemanager-hadoop11.log > > > I have noticed that a nodemanager was getting NPEs while tearing down. The > reason maybe similar to YARN-4355 which is reported by [# Jason Lowe]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat
[ https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587597#comment-16587597 ] lujie edited comment on YARN-8649 at 8/21/18 3:35 PM: -- Hi [~jlowe], [~pradeepambati],[~$iddhe$h] I have restudied the bug according the logs. *The root cause:* # When NM shutdowns, it will sent KILL_CONTAINER to the Container, The log has shown this event: {code:java} 2018-08-21 20:11:08,316 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1534853453424_0001_01_01 transitioned from LOCALIZING to KILLING {code} this will led the KillBeforeRunningTransition to execute. 2. In KillBeforeRunningTransition, it will call "container.cleanup()", and in "cleanup" function, it will sent "ContainerLocalizationCleanupEvent". 3. ContainerLocalizationCleanupEvent will cause the ResourceLocalizationService.handleCleanupContainerResources to execute, and in "handleCleanupContainerResources", it will send "ResourceReleaseEvent". 4. ResourceReleaseEvent will led cause the LocalResourcesTrackerImpl.handle to execute, and in handle(at line 199in source code) it will call removeResouce: {code:java} if (event.getType() == ResourceEventType.RELEASE) { if (rsrc.getState() == ResourceState.DOWNLOADING && rsrc.getRefCount() <= 0 && rsrc.getRequest().getVisibility() != LocalResourceVisibility.PUBLIC) { removeResource(req); } } {code} 5. in removeResouce, it will do: {code:java} LocalizedResource rsrc = localrsrc.remove(req); {code} 6. when heartbeat come in, the LocalResourcesTrackerImpl.getPathForLocalization will do: {code:java} Path localPath = new Path(rPath, req.getPath().getName()); LocalizedResource rsrc = localrsrc.get(req);//rsec is null rsrc.setLocalPath(localPath);//NPE {code} NPE happens! *Unit test:* While fixing YARN-4355, the patch added the test "testLocalizerHeartbeatWhenAppCleaningUp" in Class "TestResourceLocalizationService" In the test, it also send the "ContainerLocalizationCleanupEvent", but the test doesn't cover that heartbeat can comes at this moment. In this patch, we change the "testLocalizerHeartbeatWhenAppCleaningUp" to cover this situation. This change will trigger the bug. Fixing: When we fix the NPE, we only add null check, i think it is suitable here! was (Author: xiaoheipangzi): Hi [~jlowe], [~pradeepambati],[~$iddhe$h] I have restudied the bug according the logs. *The root cause:* # When NM shutdowns, it will sent KILL_CONTAINER to the Container, The log has shown this event: {code:java} 2018-08-21 20:11:08,316 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1534853453424_0001_01_01 transitioned from LOCALIZING to KILLING {code} this will led the KillBeforeRunningTransition to execute. # In KillBeforeRunningTransition, it will call "container.cleanup()", and in "cleanup" function, it will sent "ContainerLocalizationCleanupEvent". # ContainerLocalizationCleanupEvent will cause the ResourceLocalizationService.handleCleanupContainerResources to execute, and in "handleCleanupContainerResources", it will send "ResourceReleaseEvent". # ResourceReleaseEvent will led cause the LocalResourcesTrackerImpl.handle to execute, and in handle(at line 199in source code) it will call removeResouce: {code:java} if (event.getType() == ResourceEventType.RELEASE) { if (rsrc.getState() == ResourceState.DOWNLOADING && rsrc.getRefCount() <= 0 && rsrc.getRequest().getVisibility() != LocalResourceVisibility.PUBLIC) { removeResource(req); } } {code} # in removeResouce, it will do: {code:java} LocalizedResource rsrc = localrsrc.remove(req); {code} # when heartbeat come in, the LocalResourcesTrackerImpl.getPathForLocalization will do: {code:java} Path localPath = new Path(rPath, req.getPath().getName()); LocalizedResource rsrc = localrsrc.get(req);//rsec is null rsrc.setLocalPath(localPath);//NPE {code} NPE happens! *Unit test:* While fixing YARN-4355, the patch added the test "testLocalizerHeartbeatWhenAppCleaningUp" in Class "TestResourceLocalizationService" In the test, it also send the "ContainerLocalizationCleanupEvent", but the test doesn't cover that heartbeat can comes at this moment. In this patch, we change the "testLocalizerHeartbeatWhenAppCleaningUp" to cover this situation. This change will trigger the bug. Fixing: When we fix the NPE, we only add null check, i think it is suitable here! > Similar as YARN-4355:NPE while processing localizer heartbeat > - > > Key: YARN-8649 > URL: https://issues.apache.org/jira/browse/YARN-8649 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >
[jira] [Commented] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587591#comment-16587591 ] Hudson commented on YARN-7494: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14811 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14811/]) YARN-7494. Add muti-node lookup mechanism and pluggable nodes sorting (wwei: rev 9c3fc3ef2865164aa5f121793ac914cfeb21a181) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/MultiNodeLookupPolicy.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAppSchedulingInfo.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ClusterNodeTracker.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/MultiNodeSorter.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/ApplicationSchedulingConfig.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/MultiNodePolicySpec.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/MultiNodeSortingManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/ResourceUsageMultiNodeLookupPolicy.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/activities/ActivitiesLogger.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/activities/ActivitiesManager.java * (edit)
[jira] [Commented] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587564#comment-16587564 ] Weiwei Yang commented on YARN-7494: --- I just pushed the patch to trunk, thanks for all the efforts [~sunilg], and also thanks for the reviews [~leftnoteasy]. > Add muti-node lookup mechanism and pluggable nodes sorting policies to > optimize placement decision > -- > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, > YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-7863: -- Fix Version/s: (was: 3.2.0) > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-7863: -- Comment: was deleted (was: I just pushed this to trunk. Thanks for getting it done [~sunilg]! And thanks for the reviews from [~leftnoteasy].) > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reopened YARN-7863: --- > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587535#comment-16587535 ] Weiwei Yang commented on YARN-7494: --- LGTM, +1 to v20 patch. I will commit this to trunk shortly. Thanks [~sunilg] > Add muti-node lookup mechanism and pluggable nodes sorting policies to > optimize placement decision > -- > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, > YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-7494: -- Summary: Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision (was: Add muti node lookup support for better placement) > Add muti-node lookup mechanism and pluggable nodes sorting policies to > optimize placement decision > -- > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, > YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8642) Add support for tmpfs mounts with the Docker runtime
[ https://issues.apache.org/jira/browse/YARN-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit reassigned YARN-8642: -- Assignee: Craig Condit > Add support for tmpfs mounts with the Docker runtime > > > Key: YARN-8642 > URL: https://issues.apache.org/jira/browse/YARN-8642 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Craig Condit >Priority: Major > Labels: Docker > > Add support to the existing Docker runtime to allow the user to request tmpfs > mounts for their containers. For example: > {code}/usr/bin/docker run --name=container_name --tmpfs /run image > /bootstrap/start-systemd > {code} > One use case is to allow systemd to run as PID 1 in a non-privileged > container, /run is expected to be a tmpfs mount in the container for that to > work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8642) Add support for tmpfs mounts with the Docker runtime
[ https://issues.apache.org/jira/browse/YARN-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587489#comment-16587489 ] Craig Condit commented on YARN-8642: [~shaneku...@gmail.com], I'd like to work on this. > Add support for tmpfs mounts with the Docker runtime > > > Key: YARN-8642 > URL: https://issues.apache.org/jira/browse/YARN-8642 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Priority: Major > Labels: Docker > > Add support to the existing Docker runtime to allow the user to request tmpfs > mounts for their containers. For example: > {code}/usr/bin/docker run --name=container_name --tmpfs /run image > /bootstrap/start-systemd > {code} > One use case is to allow systemd to run as PID 1 in a non-privileged > container, /run is expected to be a tmpfs mount in the container for that to > work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7680) ContainerMetrics is registered even if yarn.nodemanager.container-metrics.enable is set to false
[ https://issues.apache.org/jira/browse/YARN-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned YARN-7680: Assignee: Zoltan Siegl > ContainerMetrics is registered even if > yarn.nodemanager.container-metrics.enable is set to false > > > Key: YARN-7680 > URL: https://issues.apache.org/jira/browse/YARN-7680 > Project: Hadoop YARN > Issue Type: Bug > Components: metrics >Affects Versions: 3.0.0 >Reporter: Akira Ajisaka >Assignee: Zoltan Siegl >Priority: Critical > > ContainerMetrics is unintentionally registered to DefaultMetricsSystem even > if yarn.nodemanager.container-metrics.enable is set to false. For example, > when we set > *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 to > sink all the metrics to Ganglia, MetricsSystem sink ContainerMetrics to > ganglia server (localhost:8649 by default). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587425#comment-16587425 ] genericqa commented on YARN-8468: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 11s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 27s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 31 new + 562 unchanged - 11 fixed = 593 total (was 573) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 17s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 18s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 30s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}145m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.TestAppManager | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587422#comment-16587422 ] Antal Bálint Steinbach commented on YARN-8468: -- Hi [~wilfreds], Currently, resource types are not handled at all. The minimum and maximum checks are done only for vcores and memory. I think we can create a new ticket for this. > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587389#comment-16587389 ] Sunil Govindan commented on YARN-7494: -- [~cheersyang] Fixed checkstyles which are possible. Some lines length cant be done as it is name etc. Also that class doesnt need a setter and getter. Pls check. > Add muti node lookup support for better placement > - > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, > YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587386#comment-16587386 ] genericqa commented on YARN-7494: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 6 new + 670 unchanged - 4 fixed = 676 total (was 674) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 47s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}126m 35s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-7494 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936423/YARN-7494.20.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 469a0c4aa9e0 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d3fef7a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21647/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21647/testReport/ | | Max. process+thread count | 869 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Updated] (YARN-8695) ERROR: Container complete event for unknown container id
[ https://issues.apache.org/jira/browse/YARN-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8695: - Priority: Minor (was: Major) Downgrading the priority since this has no impact on functionality. Does "container id container_1534394833079_0012_01_06" appear earlier in the AM log? This may simply be a case where the AM has already decided to forget about a container it used earlier to run a task, and complains when the RM informs the AM of the completion of that container. If that is indeed what is happening then this bug should be moved to the MAPREDUCE project, as it would be a bug in the MapReduce AM code rather than YARN. > ERROR: Container complete event for unknown container id > > > Key: YARN-8695 > URL: https://issues.apache.org/jira/browse/YARN-8695 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Reporter: sivasankar >Priority: Minor > > Have deployed a cluster with *3 data nodes*. YARN/MapReduce2/HDFS version is > *2.7.3* on HDP. While running teragen and Gobblin the following Yarn errors > get reported in the logs. Errors get reported only when the map tasks defined > for the job less than or equals to the number of data nodes in the cluster. > For *Teragen* -Dmapreduce.job.maps=4 > For *Gobblin* mr.job.max.mappers=4 > There are no errors if the map tasks(splits) are <= number of data nodes. > 2018-08-16 06:54:05,681 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: *Container > complete event for unknown container id > container_1534394833079_0012_01_06* > 2018-08-16 05:00:50,138 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container > complete event for unknown container id > container_1534394833079_0001_01_55 2018-08-16 05:00:50,138 INFO > [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1534394833079_0001_01_54 2018-08-16 > 05:00:50,138 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container > complete event for unknown container id > container_1534394833079_0001_01_54 2018-08-16 05:00:50,138 INFO > [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received > completed container container_1534394833079_0001_01_53 2018-08-16 > 05:00:50,138 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container > complete event for unknown container id container_1534394833079_0001_01_53 > *Note*: There is no functionality issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587371#comment-16587371 ] Sunil Govindan commented on YARN-7863: -- Thanks [~cheersyang] [~Naganarasimha] *TestCases* could be added for DS in another patch i think. It ll cover all DS level cases. I ll add AND & OR cases in this one. Other cases are covered. > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-7863: - Attachment: YARN-7863-YARN-3409.007.patch > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, > YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, > YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, > YARN-7863-YARN-3409.007.patch, YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8695) ERROR: Container complete event for unknown container id
sivasankar created YARN-8695: Summary: ERROR: Container complete event for unknown container id Key: YARN-8695 URL: https://issues.apache.org/jira/browse/YARN-8695 Project: Hadoop YARN Issue Type: Bug Components: RM Reporter: sivasankar Have deployed a cluster with *3 data nodes*. YARN/MapReduce2/HDFS version is *2.7.3* on HDP. While running teragen and Gobblin the following Yarn errors get reported in the logs. Errors get reported only when the map tasks defined for the job less than or equals to the number of data nodes in the cluster. For *Teragen* -Dmapreduce.job.maps=4 For *Gobblin* mr.job.max.mappers=4 There are no errors if the map tasks(splits) are <= number of data nodes. 2018-08-16 06:54:05,681 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: *Container complete event for unknown container id container_1534394833079_0012_01_06* 2018-08-16 05:00:50,138 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_1534394833079_0001_01_55 2018-08-16 05:00:50,138 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1534394833079_0001_01_54 2018-08-16 05:00:50,138 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_1534394833079_0001_01_54 2018-08-16 05:00:50,138 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1534394833079_0001_01_53 2018-08-16 05:00:50,138 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_1534394833079_0001_01_53 *Note*: There is no functionality issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8694) app flex with relative changes does not work
[ https://issues.apache.org/jira/browse/YARN-8694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kyungwan nam updated YARN-8694: --- Attachment: YARN-8694.001.patch > app flex with relative changes does not work > > > Key: YARN-8694 > URL: https://issues.apache.org/jira/browse/YARN-8694 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: kyungwan nam >Priority: Major > Attachments: YARN-8694.001.patch > > > I'd like to increase 2 containers as belows. > {code:java} > yarn app -flex my-sleeper -component sleeper +2{code} > but, It did not work. it seems to request 2, not +2. > > ApiServiceClient.actionFlex > {code:java} > @Override > public int actionFlex(String appName, Map componentCounts) > throws IOException, YarnException { > int result = EXIT_SUCCESS; > try { > Service service = new Service(); > service.setName(appName); > service.setState(ServiceState.FLEX); > for (Map.Entry entry : componentCounts.entrySet()) { > Component component = new Component(); > component.setName(entry.getKey()); > Long numberOfContainers = Long.parseLong(entry.getValue()); > component.setNumberOfContainers(numberOfContainers); > service.addComponent(component); > } > String buffer = jsonSerDeser.toJson(service); > ClientResponse response = getApiClient(getServicePath(appName)) > .put(ClientResponse.class, buffer);{code} > It looks like there is no code, which handle “+”, “-“ in > ApiServiceClient.actionFlex -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8685) Add containers query support for nodes/node REST API in RMWebServices
[ https://issues.apache.org/jira/browse/YARN-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587317#comment-16587317 ] Weiwei Yang commented on YARN-8685: --- Hi [~Tao Yang] Can we add a new endpoint in RM something like {noformat} http:///ws/v1/cluster/containers/{nodeId}?states=ALLOCATED {noformat} to display RM containers. Where query parameter \{{state}} is an optional filter, a comma list of states. This way we avoid returning too much info in a single API. And second, can we pull out the the \{{ContainerInfo}} to {{hadoop-yarn-common/o.a.h.y.webapp.dao}} so it can be shared by both RM and NM containers endpoints? > Add containers query support for nodes/node REST API in RMWebServices > - > > Key: YARN-8685 > URL: https://issues.apache.org/jira/browse/YARN-8685 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8685.001.patch > > > Currently we can only query running containers from NM containers REST API, > but can't get the valid containers which are in ALLOCATED/ACQUIRED state. We > have the requirements to get all containers allocated on specified nodes for > debugging. I want to add a "includeContainers" query param (default false) > for nodes/node REST API in RMWebServices, so that we can get valid containers > on nodes if "includeContainers=true" specified. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8683) Support to display pending scheduling requests in RM app attempt page
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587312#comment-16587312 ] Hudson commented on YARN-8683: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14810 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14810/]) YARN-8683. Support to display pending scheduling requests in RM app (wwei: rev 54d0bf8935e35aad0f4d67df358ceb970cfcd713) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/AppPlacementAllocator.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/LocalityAppPlacementAllocator.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/SingleConstraintAppPlacementAllocator.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceRequestInfo.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java > Support to display pending scheduling requests in RM app attempt page > -- > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, YARN-8683.004.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8683) Support to display pending scheduling requests in RM app attempt page
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587300#comment-16587300 ] Tao Yang commented on YARN-8683: Thanks [~cheersyang]. > Support to display pending scheduling requests in RM app attempt page > -- > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, YARN-8683.004.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8683) Support to display pending scheduling requests in RM app attempt page
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587295#comment-16587295 ] Weiwei Yang commented on YARN-8683: --- Thanks [~Tao Yang] for the contribution, I have committed this to trunk. > Support to display pending scheduling requests in RM app attempt page > -- > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, YARN-8683.004.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8683) Support to display pending scheduling requests in RM app attempt page
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8683: -- Issue Type: Improvement (was: Bug) > Support to display pending scheduling requests in RM app attempt page > -- > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, YARN-8683.004.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8683) Support to display pending scheduling requests in RM app attempt page
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8683: -- Summary: Support to display pending scheduling requests in RM app attempt page (was: Display pending scheduling requests in RM app attempt page ) > Support to display pending scheduling requests in RM app attempt page > -- > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, YARN-8683.004.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8694) app flex with relative changes does not work
kyungwan nam created YARN-8694: -- Summary: app flex with relative changes does not work Key: YARN-8694 URL: https://issues.apache.org/jira/browse/YARN-8694 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.1 Reporter: kyungwan nam I'd like to increase 2 containers as belows. {code:java} yarn app -flex my-sleeper -component sleeper +2{code} but, It did not work. it seems to request 2, not +2. ApiServiceClient.actionFlex {code:java} @Override public int actionFlex(String appName, Map componentCounts) throws IOException, YarnException { int result = EXIT_SUCCESS; try { Service service = new Service(); service.setName(appName); service.setState(ServiceState.FLEX); for (Map.Entry entry : componentCounts.entrySet()) { Component component = new Component(); component.setName(entry.getKey()); Long numberOfContainers = Long.parseLong(entry.getValue()); component.setNumberOfContainers(numberOfContainers); service.addComponent(component); } String buffer = jsonSerDeser.toJson(service); ClientResponse response = getApiClient(getServicePath(appName)) .put(ClientResponse.class, buffer);{code} It looks like there is no code, which handle “+”, “-“ in ApiServiceClient.actionFlex -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8683) Display pending scheduling requests in RM app attempt page
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8683: -- Summary: Display pending scheduling requests in RM app attempt page (was: Display outstanding pending scheduling requests in RM app attempt page ) > Display pending scheduling requests in RM app attempt page > --- > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, YARN-8683.004.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8683) Display outstanding pending scheduling requests in RM app attempt page
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8683: -- Summary: Display outstanding pending scheduling requests in RM app attempt page (was: Support scheduling request for outstanding requests info in RMAppAttemptBlock) > Display outstanding pending scheduling requests in RM app attempt page > --- > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, YARN-8683.004.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8683) Support scheduling request for outstanding requests info in RMAppAttemptBlock
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587287#comment-16587287 ] Weiwei Yang commented on YARN-8683: --- +1, committing now > Support scheduling request for outstanding requests info in RMAppAttemptBlock > - > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, YARN-8683.004.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8468: - Attachment: YARN-8468.004.patch > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-7494: - Attachment: YARN-7494.20.patch > Add muti node lookup support for better placement > - > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, > YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, > YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, > YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, > YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8468: - Attachment: (was: YARN-8468.005.patch) > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8468: - Attachment: YARN-8468.005.patch > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.005.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8683) Support scheduling request for outstanding requests info in RMAppAttemptBlock
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587266#comment-16587266 ] genericqa commented on YARN-8683: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 76m 44s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8683 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936398/YARN-8683.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 53a57a3a4bbf 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d3fef7a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21644/testReport/ | | Max. process+thread count | 885 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21644/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT
[jira] [Created] (YARN-8693) Add signalToContainer REST API for RMWebServices
Tao Yang created YARN-8693: -- Summary: Add signalToContainer REST API for RMWebServices Key: YARN-8693 URL: https://issues.apache.org/jira/browse/YARN-8693 Project: Hadoop YARN Issue Type: Improvement Components: restapi Affects Versions: 3.2.0 Reporter: Tao Yang Assignee: Tao Yang Currently YARN has a RPC command which is "yarn container -signal " to signal OUTPUT_THREAD_DUMP/GRACEFUL_SHUTDOWN/FORCEFUL_SHUTDOWN commands to container. That is not enough and we need to add signalToContainer REST API for better management from cluster administrators or management system. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8692) Support node utilization metrics for SLS
[ https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8692: --- Description: The distribution of node utilization is an important healthy factor for the YARN cluster, related metrics in SLS can be used to evaluate the scheduling effects and optimize related configurations. To implement this improvement, we need to do things as below: (1) Add input configurations (contain avg and stddev for cpu/memory utilization ratio) and generate utilization samples for tasks, not include AM container cause I think it's negligible. (2) Simulate containers and node utilization within node status. (3) calculate and generate the distribution metrics and use standard deviation metric (stddev for short) to evaluate the effects(smaller is better). (4) show these metrics on SLS simulator page like this: !image-2018-08-21-18-04-22-749.png! For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, and P0 represents 0%~9% utilization ratio(containers-utilization / node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization ratio. was: The distribution of node utilization is an important healthy factor for the YARN cluster, related metrics in SLS can be used to evaluate the scheduling effects and optimize related configurations. To implement this improvement, we need to do things as below: (1) Add input configurations (contain avg and stddev for cpu/memory utilization ratio) and generate utilization samples for tasks, not include AM container cause I think it's negligible. (2) Simulate containers and node utilization within node status. (3) calculate and generate the distribution metrics and use standard deviation metric (stddev for short) to evaluate the effects(smaller is better). (4) show these metrics on SLS simulator page like this: !image-2018-08-21-18-04-22-749.png! For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, and P0 represents 0%~9% utilization ratio(containers-utilization / node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization ratio. > Support node utilization metrics for SLS > > > Key: YARN-8692 > URL: https://issues.apache.org/jira/browse/YARN-8692 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: image-2018-08-21-18-04-22-749.png > > > The distribution of node utilization is an important healthy factor for the > YARN cluster, related metrics in SLS can be used to evaluate the scheduling > effects and optimize related configurations. > To implement this improvement, we need to do things as below: > (1) Add input configurations (contain avg and stddev for cpu/memory > utilization ratio) and generate utilization samples for tasks, not include AM > container cause I think it's negligible. > (2) Simulate containers and node utilization within node status. > (3) calculate and generate the distribution metrics and use standard > deviation metric (stddev for short) to evaluate the effects(smaller is > better). > (4) show these metrics on SLS simulator page like this: > !image-2018-08-21-18-04-22-749.png! > For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, > and P0 represents 0%~9% utilization ratio(containers-utilization / > node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents > 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization > ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8692) Support node utilization metrics for SLS
[ https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8692: --- Attachment: (was: image-2018-08-21-18-03-59-665.png) > Support node utilization metrics for SLS > > > Key: YARN-8692 > URL: https://issues.apache.org/jira/browse/YARN-8692 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: image-2018-08-21-18-04-22-749.png > > > The distribution of node utilization is an important healthy factor for the > YARN cluster, related metrics in SLS can be used to evaluate the scheduling > effects and optimize related configurations. > To implement this improvement, we need to do things as below: > (1) Add input configurations (contain avg and stddev for cpu/memory > utilization ratio) and generate utilization samples for tasks, not include AM > container cause I think it's negligible. (2) Simulate containers and node > utilization within node status. > (3) calculate and generate the distribution metrics and use standard > deviation metric (stddev for short) to evaluate the effects(smaller is > better). > (4) show these metrics on SLS simulator page like this: > !image-2018-08-21-18-04-22-749.png! > For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, > and P0 represents 0%~9% utilization ratio(containers-utilization / > node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents > 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization > ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8692) Support node utilization metrics for SLS
[ https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8692: --- Description: The distribution of node utilization is an important healthy factor for the YARN cluster, related metrics in SLS can be used to evaluate the scheduling effects and optimize related configurations. To implement this improvement, we need to do things as below: (1) Add input configurations (contain avg and stddev for cpu/memory utilization ratio) and generate utilization samples for tasks, not include AM container cause I think it's negligible. (2) Simulate containers and node utilization within node status. (3) calculate and generate the distribution metrics and use standard deviation metric (stddev for short) to evaluate the effects(smaller is better). (4) show these metrics on SLS simulator page like this: !image-2018-08-21-18-04-22-749.png! For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, and P0 represents 0%~9% utilization ratio(containers-utilization / node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization ratio. was: The distribution of node utilization is an important healthy factor for the YARN cluster, related metrics in SLS can be used to evaluate the scheduling effects and optimize related configurations. To implement this improvement, we need to do things as below: (1) Add input configurations (contain avg and stddev for cpu/memory utilization ratio) and generate utilization samples for tasks, not include AM container cause I think it's negligible. (2) Simulate containers and node utilization within node status. (3) calculate and generate the distribution metrics and use standard deviation metric (stddev for short) to evaluate the effects(smaller is better). (4) show these metrics on SLS simulator page like this: !image-2018-08-21-17-50-04-011.png! For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, and P0 represents 0%~9% utilization ratio(containers-utilization / node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization ratio. > Support node utilization metrics for SLS > > > Key: YARN-8692 > URL: https://issues.apache.org/jira/browse/YARN-8692 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: image-2018-08-21-18-03-59-665.png, > image-2018-08-21-18-04-22-749.png > > > The distribution of node utilization is an important healthy factor for the > YARN cluster, related metrics in SLS can be used to evaluate the scheduling > effects and optimize related configurations. > To implement this improvement, we need to do things as below: > (1) Add input configurations (contain avg and stddev for cpu/memory > utilization ratio) and generate utilization samples for tasks, not include AM > container cause I think it's negligible. (2) Simulate containers and node > utilization within node status. > (3) calculate and generate the distribution metrics and use standard > deviation metric (stddev for short) to evaluate the effects(smaller is > better). > (4) show these metrics on SLS simulator page like this: > !image-2018-08-21-18-04-22-749.png! > For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, > and P0 represents 0%~9% utilization ratio(containers-utilization / > node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents > 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization > ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8692) Support node utilization metrics for SLS
[ https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8692: --- Attachment: image-2018-08-21-18-04-22-749.png > Support node utilization metrics for SLS > > > Key: YARN-8692 > URL: https://issues.apache.org/jira/browse/YARN-8692 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: image-2018-08-21-18-03-59-665.png, > image-2018-08-21-18-04-22-749.png > > > The distribution of node utilization is an important healthy factor for the > YARN cluster, related metrics in SLS can be used to evaluate the scheduling > effects and optimize related configurations. > To implement this improvement, we need to do things as below: > (1) Add input configurations (contain avg and stddev for cpu/memory > utilization ratio) and generate utilization samples for tasks, not include AM > container cause I think it's negligible. (2) Simulate containers and node > utilization within node status. > (3) calculate and generate the distribution metrics and use standard > deviation metric (stddev for short) to evaluate the effects(smaller is > better). > (4) show these metrics on SLS simulator page like this: > !image-2018-08-21-17-50-04-011.png! > For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, > and P0 represents 0%~9% utilization ratio(containers-utilization / > node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents > 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization > ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8692) Support node utilization metrics for SLS
[ https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8692: --- Attachment: image-2018-08-21-18-03-59-665.png > Support node utilization metrics for SLS > > > Key: YARN-8692 > URL: https://issues.apache.org/jira/browse/YARN-8692 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: image-2018-08-21-18-03-59-665.png, > image-2018-08-21-18-04-22-749.png > > > The distribution of node utilization is an important healthy factor for the > YARN cluster, related metrics in SLS can be used to evaluate the scheduling > effects and optimize related configurations. > To implement this improvement, we need to do things as below: > (1) Add input configurations (contain avg and stddev for cpu/memory > utilization ratio) and generate utilization samples for tasks, not include AM > container cause I think it's negligible. (2) Simulate containers and node > utilization within node status. > (3) calculate and generate the distribution metrics and use standard > deviation metric (stddev for short) to evaluate the effects(smaller is > better). > (4) show these metrics on SLS simulator page like this: > !image-2018-08-21-17-50-04-011.png! > For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, > and P0 represents 0%~9% utilization ratio(containers-utilization / > node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents > 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization > ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8692) Support node utilization metrics for SLS
[ https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8692: --- Attachment: (was: image-2018-08-21-17-50-04-011.png) > Support node utilization metrics for SLS > > > Key: YARN-8692 > URL: https://issues.apache.org/jira/browse/YARN-8692 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler-load-simulator >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: image-2018-08-21-18-03-59-665.png, > image-2018-08-21-18-04-22-749.png > > > The distribution of node utilization is an important healthy factor for the > YARN cluster, related metrics in SLS can be used to evaluate the scheduling > effects and optimize related configurations. > To implement this improvement, we need to do things as below: > (1) Add input configurations (contain avg and stddev for cpu/memory > utilization ratio) and generate utilization samples for tasks, not include AM > container cause I think it's negligible. (2) Simulate containers and node > utilization within node status. > (3) calculate and generate the distribution metrics and use standard > deviation metric (stddev for short) to evaluate the effects(smaller is > better). > (4) show these metrics on SLS simulator page like this: > !image-2018-08-21-18-04-22-749.png! > For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, > and P0 represents 0%~9% utilization ratio(containers-utilization / > node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents > 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization > ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8692) Support node utilization metrics for SLS
Tao Yang created YARN-8692: -- Summary: Support node utilization metrics for SLS Key: YARN-8692 URL: https://issues.apache.org/jira/browse/YARN-8692 Project: Hadoop YARN Issue Type: Improvement Components: scheduler-load-simulator Affects Versions: 3.2.0 Reporter: Tao Yang Assignee: Tao Yang Attachments: image-2018-08-21-17-50-04-011.png The distribution of node utilization is an important healthy factor for the YARN cluster, related metrics in SLS can be used to evaluate the scheduling effects and optimize related configurations. To implement this improvement, we need to do things as below: (1) Add input configurations (contain avg and stddev for cpu/memory utilization ratio) and generate utilization samples for tasks, not include AM container cause I think it's negligible. (2) Simulate containers and node utilization within node status. (3) calculate and generate the distribution metrics and use standard deviation metric (stddev for short) to evaluate the effects(smaller is better). (4) show these metrics on SLS simulator page like this: !image-2018-08-21-17-50-04-011.png! For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, and P0 represents 0%~9% utilization ratio(containers-utilization / node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587157#comment-16587157 ] genericqa commented on YARN-7494: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 42s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 10 new + 671 unchanged - 4 fixed = 681 total (was 675) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 5s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 48s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-7494 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936384/YARN-7494.19.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8547b23bbe05 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 770d9d9 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21643/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit |
[jira] [Updated] (YARN-8683) Support scheduling request for outstanding requests info in RMAppAttemptBlock
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8683: --- Attachment: YARN-8683.004.patch > Support scheduling request for outstanding requests info in RMAppAttemptBlock > - > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, YARN-8683.004.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8683) Support scheduling request for outstanding requests info in RMAppAttemptBlock
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587146#comment-16587146 ] Tao Yang commented on YARN-8683: Thanks [~cheersyang] for your suggestion! Attached v4 to improve the content of AllocationTags. Updates: {code:java} - .append(resourceRequest.getAllocationTags() == null ? "N/A" - : resourceRequest.getAllocationTags()) + .append(resourceRequest.getAllocationTags() == null ? "N/A" : + StringUtils.join(resourceRequest.getAllocationTags(), ",")) {code} > Support scheduling request for outstanding requests info in RMAppAttemptBlock > - > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, YARN-8683.004.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4931) Preempted resources go back to the same application
[ https://issues.apache.org/jira/browse/YARN-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587141#comment-16587141 ] Haibo Chen commented on YARN-4931: -- This should have been fixed by YARN-6432, which is included since 2.9 > Preempted resources go back to the same application > --- > > Key: YARN-4931 > URL: https://issues.apache.org/jira/browse/YARN-4931 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: Miles Crawford >Priority: Major > Attachments: resourcemanager.log > > > Sometimes a queue that needs resources causes preemption - but the preempted > containers are just allocated right back to the application that just > released them! > Here is a tiny application (0007) that wants resources, and a container is > preempted from application 0002 to satisfy it: > {code} > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler > (FairSchedulerUpdateThread): Should preempt res for > queue root.default: resDueToMinShare = , > resDueToFairShare = > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler > (FairSchedulerUpdateThread): Preempting container (prio=1res= vCores:1>) from queue root.milesc > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics > (FairSchedulerUpdateThread): Non-AM container preempted, current > appAttemptId=appattempt_1460047303577_0002_01, > containerId=container_1460047303577_0002_01_001038, resource= vCores:1> > 2016-04-07 21:08:13,463 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (FairSchedulerUpdateThread): container_1460047303577_0002_01_001038 Container > Transitioned from RUNNING to KILLED > {code} > But then a moment later, application 2 gets the container right back: > {code} > 2016-04-07 21:08:13,844 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode > (ResourceManager Event Processor): Assigned container > container_1460047303577_0002_01_001039 of capacity > on host ip-10-12-40-63.us-west-2.compute.internal:8041, which has 13 > containers, used and > available after allocation > 2016-04-07 21:08:14,555 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (IPC Server handler 59 on 8030): container_1460047303577_0002_01_001039 > Container Transitioned from ALLOCATED to ACQUIRED > 2016-04-07 21:08:14,845 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl > (ResourceManager Event Processor): container_1460047303577_0002_01_001039 > Container Transitioned from ACQUIRED to RUNNING > {code} > This results in new applications being unable to even get an AM, and never > starting at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8681) Wrong error message in RM placement constraints check
[ https://issues.apache.org/jira/browse/YARN-8681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587140#comment-16587140 ] Weiwei Yang commented on YARN-8681: --- Hi [~snemeth] lets close it as won't fix as this code will be removed by YARN-8015. Thanks! > Wrong error message in RM placement constraints check > - > > Key: YARN-8681 > URL: https://issues.apache.org/jira/browse/YARN-8681 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 3.2.0, 3.1.1 >Reporter: Daniel Templeton >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8681.001.patch > > > In > {{SingleConstraintAppPlacementAllocator.validateAndSetSchedulingRequest()}} I > see the following: > {code} if (singleConstraint.getMinCardinality() != 0 > || singleConstraint.getMaxCardinality() != 0) { > throwExceptionWithMetaInfo( > "Only support anti-affinity, which is: minCardinality=0, " > + "maxCardinality=1"); > }{code} > I think the error message should say {{"maxCardinality=0"}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8683) Support scheduling request for outstanding requests info in RMAppAttemptBlock
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587137#comment-16587137 ] Weiwei Yang commented on YARN-8683: --- Hi [~Tao Yang], yep, that looks better, could you please update the patch accordingly? > Support scheduling request for outstanding requests info in RMAppAttemptBlock > - > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8683) Support scheduling request for outstanding requests info in RMAppAttemptBlock
[ https://issues.apache.org/jira/browse/YARN-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587135#comment-16587135 ] Tao Yang commented on YARN-8683: Just noticed that AllocationTags content shows "[tag1]" or "[tag1,tag2]". Is it better to remove the bracket, make it just show "tag1" or "tag1,tag2" ? > Support scheduling request for outstanding requests info in RMAppAttemptBlock > - > > Key: YARN-8683 > URL: https://issues.apache.org/jira/browse/YARN-8683 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8683.001.patch, YARN-8683.002.patch, > YARN-8683.003.patch, screenshot-1.png, screenshot-2.png > > > Currently outstanding requests info in app attempt page only show pending > resource requests, pending scheduling requests should be shown here too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org