[jira] [Commented] (YARN-7892) Revisit NodeAttribute class structure
[ https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472932#comment-16472932 ] Bibin A Chundatt commented on YARN-7892: [~Naganarasimha] TestPipeApplication timeout is happening in trunk too. Over all the patch looks good.. Will commit later today if no objections. > Revisit NodeAttribute class structure > - > > Key: YARN-7892 > URL: https://issues.apache.org/jira/browse/YARN-7892 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Major > Attachments: YARN-7892-YARN-3409.001.patch, > YARN-7892-YARN-3409.002.patch, YARN-7892-YARN-3409.003.WIP.patch, > YARN-7892-YARN-3409.003.patch, YARN-7892-YARN-3409.004.patch, > YARN-7892-YARN-3409.005.patch, YARN-7892-YARN-3409.006.patch, > YARN-7892-YARN-3409.007.patch, YARN-7892-YARN-3409.008.patch, > YARN-7892-YARN-3409.009.patch, YARN-7892-YARN-3409.010.patch > > > In the existing structure, we had kept the type and value along with the > attribute which would create confusion to the user to understand the APIs as > they would not be clear as to what needs to be sent for type and value while > fetching the mappings for node(s). > As well as equals will not make sense when we compare only for prefix and > name where as values for them might be different. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472927#comment-16472927 ] Hudson commented on YARN-8265: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14181 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14181/]) YARN-8265. Improve DNS handling on docker IP changes. (eyang: rev 0ff94563b9b62d0426d475dc0f84152b68f1ff0d) * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestServiceAM.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/MockServiceAM.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstance.java > Service AM should retrieve new IP for docker container relaunched by NM > --- > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8265.001.patch, YARN-8265.002.patch, > YARN-8265.003.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472923#comment-16472923 ] Eric Yang commented on YARN-8265: - +1 looks good to me. I just committed this on trunk and branch-3.1. Thank you [~billie.rinaldi] for the review and patch. > Service AM should retrieve new IP for docker container relaunched by NM > --- > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8265.001.patch, YARN-8265.002.patch, > YARN-8265.003.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8265: Target Version/s: 3.2.0, 3.1.1 (was: 3.2.0) Fix Version/s: 3.1.1 3.2.0 > Service AM should retrieve new IP for docker container relaunched by NM > --- > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8265.001.patch, YARN-8265.002.patch, > YARN-8265.003.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472901#comment-16472901 ] Eric Yang commented on YARN-8265: - "onContainerRestart" event is currently not working. Therefore the workaround solution is the only feasible solution. Therefore, I am inclined to commit the patch 003 for 3.1.1 release. > Service AM should retrieve new IP for docker container relaunched by NM > --- > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-8265.001.patch, YARN-8265.002.patch, > YARN-8265.003.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472895#comment-16472895 ] genericqa commented on YARN-4599: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 34m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 15s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 58s{color} | {color:green} root: The patch generated 0 new + 230 unchanged - 1 fixed = 230 total (was 231) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 25s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}173m 16s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 52s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}376m 58s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.client.impl.TestBlockReaderLocal | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor | \\ \\ || Subsystem || Report/Notes || |
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472863#comment-16472863 ] Íñigo Goiri commented on YARN-8275: --- [~miklos.szeg...@cloudera.com], we would have to decide how to write the native service and that opens a big design space. Do you have any proposal for that? In any case, I would make this pluggable and then we can rely on winutils, a separate service or JNI. > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container
[ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472839#comment-16472839 ] Eric Yang commented on YARN-7654: - [~jlowe] Thank you for the great reviews and commit. [~shaneku...@gmail.com] [~Jim_Brennan] [~ebadger] Thank you for the reviews. > Support ENTRY_POINT for docker container > > > Key: YARN-7654 > URL: https://issues.apache.org/jira/browse/YARN-7654 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-7654.001.patch, YARN-7654.002.patch, > YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, > YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, > YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, > YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, > YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, > YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, > YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, > YARN-7654.024.patch > > > Docker image may have ENTRY_POINT predefined, but this is not supported in > the current implementation. It would be nice if we can detect existence of > {{launch_command}} and base on this variable launch docker container in > different ways: > h3. Launch command exists > {code} > docker run [image]:[version] > docker exec [container_id] [launch_command] > {code} > h3. Use ENTRY_POINT > {code} > docker run [image]:[version] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472837#comment-16472837 ] Eric Yang commented on YARN-8265: - [~billie.rinaldi] I am struggling to understand the reason that node manager would decide to restart the docker container without consulting with application master. AM makes the decision of the state of the containers, and node manager only follow orders from AM. This helps to prevent race conditions between AM and NM to decide which container should stay up and running. AM will follow state transitions to ensure it is following a pre-defined path. With relaunch container implemented in YARN-7973, AM still make decision when to restart container. "onContainerRestart" event will be received by AM. If we run ContainerStartedTransition again, it will check for IP changes and cancel the scheduled timer thread. I think this will leads to more desired outcome without leaving the timer thread open ended. Alternate approach is to move ContainerStatusRetriever to ContainerBecomeReadyTransition, and use BECOME_READY transition to check for IP address. > Service AM should retrieve new IP for docker container relaunched by NM > --- > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-8265.001.patch, YARN-8265.002.patch, > YARN-8265.003.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472835#comment-16472835 ] genericqa commented on YARN-8265: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core: The patch generated 1 new + 30 unchanged - 0 fixed = 31 total (was 30) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 15s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8265 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923110/YARN-8265.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 09175230dafa 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4b4f24a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20710/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20710/testReport/ | | Max. process+thread count | 777 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applicati
[jira] [Commented] (YARN-8271) Change UI2 labeling of certain tables to avoid confusion
[ https://issues.apache.org/jira/browse/YARN-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472833#comment-16472833 ] genericqa commented on YARN-8271: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 37m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 46s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8271 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12922720/YARN-8271.0001.patch | | Optional Tests | asflicense shadedclient | | uname | Linux c794f8a44b0b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4b4f24a | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 301 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20711/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Change UI2 labeling of certain tables to avoid confusion > > > Key: YARN-8271 > URL: https://issues.apache.org/jira/browse/YARN-8271 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: YARN-8271.0001.patch > > > Update labeling for few items to avoid confusion > - Cluster Page (/cluster-overview): > -- "Finished apps" --> "Finished apps from all users" > -- "Running apps" --> "Running apps from all users" > - Queues overview page (/yarn-queues/root) && Per queue page > (/yarn-queue/root/apps) > -- "Running Apps" --> "Running apps from all users in queue " > - Nodes Page - side bar for all pages > -- "List of Applications" --> "List of Applications on this node" > -- "List of Containers" --> "List of Containers on this node" > - Yarn Tools > ** Yarn Tools --> YARN Tools > - Queue page > ** Running Apps: --> Running Apps From All Users -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8266) Clicking on application from cluster view should redirect to application attempt page
[ https://issues.apache.org/jira/browse/YARN-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472830#comment-16472830 ] genericqa commented on YARN-8266: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 32m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 44m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8266 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12922566/YARN-8266.001.patch | | Optional Tests | asflicense shadedclient | | uname | Linux 2c9915d5c292 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4b4f24a | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 407 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20713/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Clicking on application from cluster view should redirect to application > attempt page > - > > Key: YARN-8266 > URL: https://issues.apache.org/jira/browse/YARN-8266 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: YARN-8266.001.patch > > > Steps: > 1) Start one application > 2) Go to cluster overview page > 3) Click on applicationId from Cluster Resource Usage By Application > This action redirects to > [http://xxx:8088/ui2/#/yarn-app/application_1525740862939_0005] url. This is > invalid url. It does not show any details. > Instead It should redirect to attempt page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container
[ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472822#comment-16472822 ] Hudson commented on YARN-7654: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14180 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14180/]) YARN-7654. Support ENTRY_POINT for docker container. Contributed by Eric (jlowe: rev 6c8e51ca7eaaeef0626658b3c45d446a537e4dc0) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerClient.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerRunCommand.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.h * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/AbstractProviderService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/docker/DockerProviderService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java > Support ENTRY_POINT for docker container > > > Key: YARN-7654 > URL: https://issues.apache.org/jira/browse/YARN-7654 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-7654.001.patch, YARN-7654.002.patch, > YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, > YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, > YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, > YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, > YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, > YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, > YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, > YARN-7654.024.patch > > > Docker image may have ENTRY_POINT predefined, but this is not supported in > the current implementation. It would be nice if we can detect existence of > {{launch_command}} and base on this variable launch docker container in > different ways: > h3. Launch command exists > {code} > docker run [image]:[version] > docker exec [container_id] [launch_command] > {code} > h3. Use ENTRY_POINT > {code} > docker run [image]:[version] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container
[ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472814#comment-16472814 ] Jason Lowe commented on YARN-7654: -- Thanks for updating the patch! The unit test failure does not appear to be related, and the test passes for me with the patch applied. Looks like it is a known flaky test according to YARN-7145. +1 for patch 024. Committing this. > Support ENTRY_POINT for docker container > > > Key: YARN-7654 > URL: https://issues.apache.org/jira/browse/YARN-7654 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Labels: Docker > Attachments: YARN-7654.001.patch, YARN-7654.002.patch, > YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, > YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, > YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, > YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, > YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, > YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, > YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, > YARN-7654.024.patch > > > Docker image may have ENTRY_POINT predefined, but this is not supported in > the current implementation. It would be nice if we can detect existence of > {{launch_command}} and base on this variable launch docker container in > different ways: > h3. Launch command exists > {code} > docker run [image]:[version] > docker exec [container_id] [launch_command] > {code} > h3. Use ENTRY_POINT > {code} > docker run [image]:[version] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472808#comment-16472808 ] Billie Rinaldi commented on YARN-8265: -- Patch 3 fixes the remaining issues I have identified. One more question: a status retriever runs once per second for each container. Is this still appropriate now that they will run forever for docker containers? > Service AM should retrieve new IP for docker container relaunched by NM > --- > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-8265.001.patch, YARN-8265.002.patch, > YARN-8265.003.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-4599: - Attachment: Elastic Memory Control in YARN.pdf > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Major > Labels: oct16-medium > Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, > YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, > YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, > YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container
[ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472806#comment-16472806 ] genericqa commented on YARN-7654: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 11m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 18s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 36s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 14 new + 89 unchanged - 0 fixed = 103 total (was 89) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 53s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 30s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 55s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 44s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {
[jira] [Updated] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8265: - Attachment: YARN-8265.003.patch > Service AM should retrieve new IP for docker container relaunched by NM > --- > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-8265.001.patch, YARN-8265.002.patch, > YARN-8265.003.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications
[ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472786#comment-16472786 ] Rohith Sharma K S commented on YARN-8130: - Test failures are unrelated to this patch. [~haibochen]/[~vrushalic] could you please help to commit this. > Race condition when container events are published for KILLED applications > -- > > Key: YARN-8130 > URL: https://issues.apache.org/jira/browse/YARN-8130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Reporter: Charan Hebri >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-8130.01.patch, YARN-8130.02.patch, > YARN-8130.03.patch > > > There seems to be a race condition happening when an application is KILLED > and the corresponding container event information is being published. For > completed containers, a YARN_CONTAINER_FINISHED event is generated but for > some containers in a KILLED application this information is missing. Below is > a node manager log snippet, > {code:java} > 2018-04-09 08:44:54,474 INFO shuffle.ExternalShuffleBlockResolver > (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application > application_1523259757659_0003 removed, cleanupLocalDirs = false > 2018-04-09 08:44:54,478 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1523259757659_0003 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher > (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been > removed before the entity could be published for > TimelineEntity[type='YARN_CONTAINER', > id='container_1523259757659_0003_01_02'] > 2018-04-09 08:44:54,478 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just > finished : application_1523259757659_0003 > 2018-04-09 08:44:54,488 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_01. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:54,492 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_02. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:55,470 INFO collector.TimelineCollectorManager > (TimelineCollectorManager.java:remove(192)) - The collector service for > application_1523259757659_0003 was removed > 2018-04-09 08:44:55,472 INFO containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1572)) - couldn't find application > application_1523259757659_0003 while processing FINISH_APPS event. The > ResourceManager allocated resources for this application to the NodeManager > but no active containers were found to process{code} > The container id specified in the log, > *container_1523259757659_0003_01_02* is the one that has the finished > event missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472780#comment-16472780 ] Eric Yang commented on YARN-8274: - [~jlowe] Thank you for all your efforts. It is greatly appreciated. > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472779#comment-16472779 ] genericqa commented on YARN-8080: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 0s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 26s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 92 new + 121 unchanged - 2 fixed = 213 total (was 123) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 47s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 37s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 94m 15s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8080 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attac
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472777#comment-16472777 ] Jason Lowe commented on YARN-8274: -- bq. It would be nice if the code was refactored to add docker_binary in construct_docker_command to avoid duplicated add_to_args for docker_binary for all get_docker_*_command, but the priority is to get a good stable state for release. I was thinking the exact same thing as I was writing the patch. I went for the simple approach to keep the patch small and easy to review since it's a bugfix. I filed YARN-8284 to track that. > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container
[ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472778#comment-16472778 ] Eric Yang commented on YARN-7654: - [~jlowe] All 5 scenarios passed with my local kerberos enabled cluster tests. > Support ENTRY_POINT for docker container > > > Key: YARN-7654 > URL: https://issues.apache.org/jira/browse/YARN-7654 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Labels: Docker > Attachments: YARN-7654.001.patch, YARN-7654.002.patch, > YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, > YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, > YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, > YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, > YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, > YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, > YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, > YARN-7654.024.patch > > > Docker image may have ENTRY_POINT predefined, but this is not supported in > the current implementation. It would be nice if we can detect existence of > {{launch_command}} and base on this variable launch docker container in > different ways: > h3. Launch command exists > {code} > docker run [image]:[version] > docker exec [container_id] [launch_command] > {code} > h3. Use ENTRY_POINT > {code} > docker run [image]:[version] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8284) get_docker_command refactoring
Jason Lowe created YARN-8284: Summary: get_docker_command refactoring Key: YARN-8284 URL: https://issues.apache.org/jira/browse/YARN-8284 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.2.0, 3.1.1 Reporter: Jason Lowe YARN-8274 occurred because get_docker_command's helper functions each have to remember to put the docker binary as the first argument. This is error prone and causes code duplication for each of the helper functions. It would be safer and simpler if get_docker_command initialized the docker binary argument in one place and each of the helper functions only added the arguments specific to their particular docker sub-command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472772#comment-16472772 ] Chandni Singh commented on YARN-8141: - We had a discussion offline that * [~leftnoteasy]'s use-case is unblocked by providing configuration files in the spec * Will convert this jira to remove {{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}}. {{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} can support user specified mounts of localized resources as well. * Mark this jira for next release. > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8141: Target Version/s: 3.2.0 (was: 3.1.1) > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472771#comment-16472771 ] Eric Yang commented on YARN-8274: - [~ebadger] Your earnestly advocate is not going unheard. I am sorry that I introduced bugs during the rebase. There is no excuse for making mistakes when patch is snowballing. It won't happen again. [~jlowe] Nits: It would be nice if the code was refactored to add docker_binary in construct_docker_command to avoid duplicated add_to_args for docker_binary for all get_docker_*_command, but the priority is to get a good stable state for release. Hence, I am sorry that I committed this prematurely without listening to my inner voice. > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8283) [Umbrella] MaWo - A Master Worker framework on top of YARN Services
Yesha Vora created YARN-8283: Summary: [Umbrella] MaWo - A Master Worker framework on top of YARN Services Key: YARN-8283 URL: https://issues.apache.org/jira/browse/YARN-8283 Project: Hadoop YARN Issue Type: New Feature Reporter: Yesha Vora There is a need for an application / framework to handle Master-Worker scenarios. There are existing frameworks on YARN which can be used to run a job in distributed manner such as Mapreduce, Tez, Spark etc. But master-worker use-cases usually are force-fed into one of these existing frameworks which have been designed primarily around data-parallelism instead of generic Master Worker type of computations. In this JIRA, we’d like to contribute MaWo - a YARN Service based framework that achieves this goal. The overall goal is to create an app that can take an input job specification with tasks, their durations and have a Master dish the tasks off to a predetermined set of workers. The components will be responsible for making sure that the tasks and the overall job finish in specific time durations. We have been using a version of the MaWo framework for running unit tests of Hadoop in a parallel manner on an existing Hadoop YARN cluster. What typically takes 10 hours to run all of Hadoop project’s unit-tests can finish under 20 minutes on a MaWo app of about 50 containers! YARN-3307 was an original attempt at this but through a first-class YARN app. In this JIRA, we instead use YARN Service for orchestration so that our code can focus on the core Master Worker paradigm. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8283) [Umbrella] MaWo - A Master Worker framework on top of YARN Services
[ https://issues.apache.org/jira/browse/YARN-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora reassigned YARN-8283: Assignee: Yesha Vora > [Umbrella] MaWo - A Master Worker framework on top of YARN Services > --- > > Key: YARN-8283 > URL: https://issues.apache.org/jira/browse/YARN-8283 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > > There is a need for an application / framework to handle Master-Worker > scenarios. There are existing frameworks on YARN which can be used to run a > job in distributed manner such as Mapreduce, Tez, Spark etc. But > master-worker use-cases usually are force-fed into one of these existing > frameworks which have been designed primarily around data-parallelism instead > of generic Master Worker type of computations. > In this JIRA, we’d like to contribute MaWo - a YARN Service based framework > that achieves this goal. The overall goal is to create an app that can take > an input job specification with tasks, their durations and have a Master dish > the tasks off to a predetermined set of workers. The components will be > responsible for making sure that the tasks and the overall job finish in > specific time durations. > We have been using a version of the MaWo framework for running unit tests of > Hadoop in a parallel manner on an existing Hadoop YARN cluster. What > typically takes 10 hours to run all of Hadoop project’s unit-tests can finish > under 20 minutes on a MaWo app of about 50 containers! > YARN-3307 was an original attempt at this but through a first-class YARN app. > In this JIRA, we instead use YARN Service for orchestration so that our code > can focus on the core Master Worker paradigm. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472761#comment-16472761 ] genericqa commented on YARN-8265: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core: The patch generated 1 new + 30 unchanged - 0 fixed = 31 total (was 30) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 26s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8265 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923092/YARN-8265.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ade0de89ae2d 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4b4f24a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20708/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20708/testReport/ | | Max. process+thread count | 777 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applicat
[jira] [Assigned] (YARN-7340) Missing the time stamp in exception message in Class NoOverCommitPolicy
[ https://issues.apache.org/jira/browse/YARN-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-7340: -- Assignee: Dinesh Chitlangia > Missing the time stamp in exception message in Class NoOverCommitPolicy > --- > > Key: YARN-7340 > URL: https://issues.apache.org/jira/browse/YARN-7340 > Project: Hadoop YARN > Issue Type: Bug > Components: reservation system >Affects Versions: 3.1.0 >Reporter: Yufei Gu >Assignee: Dinesh Chitlangia >Priority: Minor > Labels: newbie++ > > It could be easily figured out by reading code. > {code} > throw new ResourceOverCommitException( > "Resources at time " + " would be overcommitted by " > + "accepting reservation: " + reservation.getReservationId()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8265: - Summary: Service AM should retrieve new IP for docker container relaunched by NM (was: AM should retrieve new IP for restarted container) > Service AM should retrieve new IP for docker container relaunched by NM > --- > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-8265.001.patch, YARN-8265.002.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7340) Missing the time stamp in exception message in Class NoOverCommitPolicy
[ https://issues.apache.org/jira/browse/YARN-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472745#comment-16472745 ] Dinesh Chitlangia commented on YARN-7340: - [~yufeigu] I would like to work on this. Could you please assign this to me? Thank you. > Missing the time stamp in exception message in Class NoOverCommitPolicy > --- > > Key: YARN-7340 > URL: https://issues.apache.org/jira/browse/YARN-7340 > Project: Hadoop YARN > Issue Type: Bug > Components: reservation system >Affects Versions: 3.1.0 >Reporter: Yufei Gu >Priority: Minor > Labels: newbie++ > > It could be easily figured out by reading code. > {code} > throw new ResourceOverCommitException( > "Resources at time " + " would be overcommitted by " > + "accepting reservation: " + reservation.getReservationId()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8243) Flex down should remove instance with largest component instance ID first
[ https://issues.apache.org/jira/browse/YARN-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472742#comment-16472742 ] Gour Saha commented on YARN-8243: - Thanks [~billie.rinaldi] and [~suma.shivaprasad] for reviewing. Also thanks to [~billie.rinaldi] for committing the patch. > Flex down should remove instance with largest component instance ID first > - > > Key: YARN-8243 > URL: https://issues.apache.org/jira/browse/YARN-8243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8243.01.patch, YARN-8243.02.patch > > > This is easy to test on a service with anti-affinity component, to simulate > pending container requests. It can be simulated by other means also (no > resource left in cluster, etc.). > Service yarnfile used to test this - > {code:java} > { > "name": "sleeper-service", > "version": "1", > "components" : > [ > { > "name": "ping", > "number_of_containers": 2, > "resource": { > "cpus": 1, > "memory": "256" > }, > "launch_command": "sleep 9000", > "placement_policy": { > "constraints": [ > { > "type": "ANTI_AFFINITY", > "scope": "NODE", > "target_tags": [ > "ping" > ] > } > ] > } > } > ] > } > {code} > Launch a service with the above yarnfile as below - > {code:java} > yarn app -launch simple-aa-1 simple_AA.json > {code} > Let's assume there are only 5 nodes in this cluster. Now, flex the above > service to 1 extra container than the number of nodes (6 in my case). > {code:java} > yarn app -flex simple-aa-1 -component ping 6 > {code} > Only 5 containers will be allocated and running for simple-aa-1. At this > point, flex it down to 5 containers - > {code:java} > yarn app -flex simple-aa-1 -component ping 5 > {code} > This is what is seen in the serviceam log at this point - > {noformat} > 2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO > service.ClientAMService - Flexing component ping to 5 > 2018-05-03 20:17:38,469 [Component dispatcher] INFO component.Component - > [FLEX DOWN COMPONENT ping]: scaling down from 6 to 5 > 2018-05-03 20:17:38,470 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE ping-4 : > container_1525297086734_0013_01_06]: Flexed down by user, destroying. > 2018-05-03 20:17:38,473 [Component dispatcher] INFO component.Component - > [COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event. > 2018-05-03 20:17:38,474 [pool-5-thread-8] INFO > registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : > container_1525297086734_0013_01_06]: Deleting registry path > /users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-06 > 2018-05-03 20:17:38,476 [Component dispatcher] ERROR component.Component - > [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CHECK_STABLE at STABLE > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2018-05-03 20:17:38,480 [Component dispatcher] ERROR component.Component - > [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CHECK_STABLE at STABLE > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state
[jira] [Commented] (YARN-8265) AM should retrieve new IP for restarted container
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472697#comment-16472697 ] Billie Rinaldi commented on YARN-8265: -- Patch 2 is allowing the AM to retrieve the new IP, but a couple more things need to be fixed. The logging when status is obtained is now too verbose and, more importantly, RegistryDNS still knows about the old IP, so it has 2 IPs for the container. > AM should retrieve new IP for restarted container > - > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-8265.001.patch, YARN-8265.002.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3610) FairScheduler: Add steady-fair-shares to the REST API documentation
[ https://issues.apache.org/jira/browse/YARN-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472696#comment-16472696 ] Hudson commented on YARN-3610: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14178 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14178/]) YARN-3610. FairScheduler: Add steady-fair-shares to the REST API (haibochen: rev 50408cfc6987b554f8f8f3d6711f7fa61c6e6d6f) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md > FairScheduler: Add steady-fair-shares to the REST API documentation > --- > > Key: YARN-3610 > URL: https://issues.apache.org/jira/browse/YARN-3610 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, fairscheduler >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Ray Chiang >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-3610.001.patch, YARN-3610.002.patch, > YARN-3610.003.patch > > > YARN-1050 adds documentation for FairScheduler REST API, but is missing the > steady-fair-share. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8265) AM should retrieve new IP for restarted container
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472693#comment-16472693 ] Billie Rinaldi commented on YARN-8265: -- How to test this bug: # Run a simple app with a sleep and exit launch command. {noformat} { "name": "test-ip-change", "version": "1", "lifetime": "3600", "configuration": { "properties": { "docker.network": "bridge" } }, "components" : [ { "name": "centos7", "number_of_containers": 1, "artifact": { "id": "library/centos:7", "type": "DOCKER" }, "launch_command": "sleep 60; exit 1", "resource": { "cpus": 2, "memory": "1024" } } ] } {noformat} # Verify that the docker container has started running, and then run the following script (assuming you only have one docker container running in your test environment). This will grab the IP of the service's container as soon as that container goes down. Leave this docker container running. {noformat} while docker ps | grep container_ > /dev/null; do :; done; docker run -it library/centos:7 bash {noformat} # After the service's container is restarted, verify that it has a new IP. Before the patch is applied, verify the AM and RegistryDNS have not received the new IP of the container. After the patch is applied, verify that they do receive the new IP. > AM should retrieve new IP for restarted container > - > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-8265.001.patch, YARN-8265.002.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472688#comment-16472688 ] Billie Rinaldi commented on YARN-8141: -- [~leftnoteasy], the user should make them STATIC configuration files in that case, and the AM will automatically set up YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS. > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-4599: - Attachment: YARN-4599.006.patch > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Major > Labels: oct16-medium > Attachments: YARN-4599.000.patch, YARN-4599.001.patch, > YARN-4599.002.patch, YARN-4599.003.patch, YARN-4599.004.patch, > YARN-4599.005.patch, YARN-4599.006.patch, YARN-4599.sandflee.patch, > yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472679#comment-16472679 ] Eric Badger commented on YARN-8274: --- bq. With 3.1.1 code freeze on Saturday, it is easy to make mistakes, and I like to get YARN-7654 committed before end of today. YARN-7654 and YARN-8207 are probably left uncommitted for too long I understand that you want to get these patches into 3.1.1, but I don't believe we should rush to get features into releases and in the process compromise on quality. Rushed patches/reviews lead to bugs like this happening at an elevated rate. I'm also not particularly compelled by the argument that YARN-7654 and YARN-8207 have been uncommitted for too long. YARN-8207 ended up being a 127 kB patch of entirely C code, which is incredibly time-consuming to review, while YARN-7654 is now on patch number 23. It's not like these aren't getting reviewed, they are just going through a normal process of comprehensive review. I think that YARN-8027 getting committed in 2 weeks is a semi-miracle given the size, complexity, and possible ramifications of the changes. Reviewing that much C code (especially in a setuid binary) throughout 10 different patches is basically a full-time job. [~jlowe] has spent countless more hours/days than I think should be reasonably expected and is still working in an attempt to get these patches into 3.1.1. If anything, he should be commended and thanked for his yeoman’s effort here regardless of whether YARN-7654 makes it into 3.1.1. So, while I understand that deadlines exist and that we should strive to meet them, I don't believe that we should rush patches in solely because of a deadline. That destabilizes the project and causes more work for everyone. If a patch/feature isn't fully ready, we should step back and get it into the next release rather than cut time on reviews and possibly miss something. At the end of the day, if we are introducing bugs like this consistently, which recently we have been, then we are clearly iterating too quickly and need to spend more time on reviewing each patch instead of rushing them to being committed. > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
[jira] [Created] (YARN-8282) [C] Create a JNI interface to interact with Windows
Giovanni Matteo Fumarola created YARN-8282: -- Summary: [C] Create a JNI interface to interact with Windows Key: YARN-8282 URL: https://issues.apache.org/jira/browse/YARN-8282 Project: Hadoop YARN Issue Type: Sub-task Reporter: Giovanni Matteo Fumarola Assignee: Giovanni Matteo Fumarola -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8265) AM should retrieve new IP for restarted container
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi reassigned YARN-8265: Assignee: Billie Rinaldi (was: Eric Yang) > AM should retrieve new IP for restarted container > - > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-8265.001.patch, YARN-8265.002.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472663#comment-16472663 ] Miklos Szegedi commented on YARN-8275: -- [~giovanni.fumarola], I am curious about your opinion about the design of YARN-4599. In that case we considered JNI vs. a long running native process communicating with YARN over pipe. The latter seems better in terms of security and maintainability in case some native functions start corrupting JVM heap. There is only a single process start in that case, so that it does not affect performance. What do you think? > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container
[ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472661#comment-16472661 ] Eric Yang commented on YARN-7654: - [~jlowe] Patch 24 fixed the issues above. I still need time to test all 5 scenarios to make sure that command doesn't get pre-processed by mistake. The 5 scenarios are: # Mapreduce # LLAP app # Docker app with command override # Docker app with entry point # Docker app with entry point and no launch command > Support ENTRY_POINT for docker container > > > Key: YARN-7654 > URL: https://issues.apache.org/jira/browse/YARN-7654 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Labels: Docker > Attachments: YARN-7654.001.patch, YARN-7654.002.patch, > YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, > YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, > YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, > YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, > YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, > YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, > YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, > YARN-7654.024.patch > > > Docker image may have ENTRY_POINT predefined, but this is not supported in > the current implementation. It would be nice if we can detect existence of > {{launch_command}} and base on this variable launch docker container in > different ways: > h3. Launch command exists > {code} > docker run [image]:[version] > docker exec [container_id] [launch_command] > {code} > h3. Use ENTRY_POINT > {code} > docker run [image]:[version] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3610) FairScheduler: Add steady-fair-shares to the REST API documentation
[ https://issues.apache.org/jira/browse/YARN-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472657#comment-16472657 ] Haibo Chen commented on YARN-3610: -- +1. Checking this in shortly. > FairScheduler: Add steady-fair-shares to the REST API documentation > --- > > Key: YARN-3610 > URL: https://issues.apache.org/jira/browse/YARN-3610 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, fairscheduler >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Ray Chiang >Priority: Major > Attachments: YARN-3610.001.patch, YARN-3610.002.patch, > YARN-3610.003.patch > > > YARN-1050 adds documentation for FairScheduler REST API, but is missing the > steady-fair-share. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472653#comment-16472653 ] Suma Shivaprasad commented on YARN-8080: Fixed UT failure/some checkstyle issues > YARN native service should support component restart policy > --- > > Key: YARN-8080 > URL: https://issues.apache.org/jira/browse/YARN-8080 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8080.001.patch, YARN-8080.002.patch, > YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, > YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, > YARN-8080.011.patch, YARN-8080.012.patch > > > Existing native service assumes the service is long running and never > finishes. Containers will be restarted even if exit code == 0. > To support boarder use cases, we need to allow restart policy of component > specified by users. Propose to have following policies: > 1) Always: containers always restarted by framework regardless of container > exit status. This is existing/default behavior. > 2) Never: Do not restart containers in any cases after container finishes: To > support job-like workload (for example Tensorflow training job). If a task > exit with code == 0, we should not restart the task. This can be used by > services which is not restart/recovery-able. > 3) On-failure: Similar to above, only restart task with exitcode != 0. > Behaviors after component *instance* finalize (Succeeded or Failed when > restart_policy != ALWAYS): > 1) For single component, single instance: complete service. > 2) For single component, multiple instance: other running instances from the > same component won't be affected by the finalized component instance. Service > will be terminated once all instances finalized. > 3) For multiple components: Service will be terminated once all components > finalized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8080: --- Attachment: YARN-8080.012.patch > YARN native service should support component restart policy > --- > > Key: YARN-8080 > URL: https://issues.apache.org/jira/browse/YARN-8080 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8080.001.patch, YARN-8080.002.patch, > YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, > YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, > YARN-8080.011.patch, YARN-8080.012.patch > > > Existing native service assumes the service is long running and never > finishes. Containers will be restarted even if exit code == 0. > To support boarder use cases, we need to allow restart policy of component > specified by users. Propose to have following policies: > 1) Always: containers always restarted by framework regardless of container > exit status. This is existing/default behavior. > 2) Never: Do not restart containers in any cases after container finishes: To > support job-like workload (for example Tensorflow training job). If a task > exit with code == 0, we should not restart the task. This can be used by > services which is not restart/recovery-able. > 3) On-failure: Similar to above, only restart task with exitcode != 0. > Behaviors after component *instance* finalize (Succeeded or Failed when > restart_policy != ALWAYS): > 1) For single component, single instance: complete service. > 2) For single component, multiple instance: other running instances from the > same component won't be affected by the finalized component instance. Service > will be terminated once all instances finalized. > 3) For multiple components: Service will be terminated once all components > finalized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472652#comment-16472652 ] Wangda Tan commented on YARN-8141: -- [~shaneku...@gmail.com], an example is user stores hadoop configs to HDFS, which will be automatically saved to container's local folder. And YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS can be used to be mounted when container being launched. This is useful for rolling upgrade case which application should not read /etc/hadoop/conf directly. > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7654) Support ENTRY_POINT for docker container
[ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7654: Attachment: YARN-7654.024.patch > Support ENTRY_POINT for docker container > > > Key: YARN-7654 > URL: https://issues.apache.org/jira/browse/YARN-7654 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Labels: Docker > Attachments: YARN-7654.001.patch, YARN-7654.002.patch, > YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, > YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, > YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, > YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, > YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, > YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, > YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, > YARN-7654.024.patch > > > Docker image may have ENTRY_POINT predefined, but this is not supported in > the current implementation. It would be nice if we can detect existence of > {{launch_command}} and base on this variable launch docker container in > different ways: > h3. Launch command exists > {code} > docker run [image]:[version] > docker exec [container_id] [launch_command] > {code} > h3. Use ENTRY_POINT > {code} > docker run [image]:[version] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472631#comment-16472631 ] Billie Rinaldi commented on YARN-8141: -- I agree, the user does not need to set YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS. They should go through the configuration files API to get local resources mounted into a container. > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8265) AM should retrieve new IP for restarted container
[ https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8265: - Attachment: YARN-8265.002.patch > AM should retrieve new IP for restarted container > - > > Key: YARN-8265 > URL: https://issues.apache.org/jira/browse/YARN-8265 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Critical > Attachments: YARN-8265.001.patch, YARN-8265.002.patch > > > When a docker container is restarted, it gets a new IP, but the service AM > only retrieves one IP for a container and then cancels the container status > retriever. I suspect the issue would be solved by restarting the retriever > (if it has been canceled) when the onContainerRestart callback is received, > but we'll have to do some testing to make sure this works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472621#comment-16472621 ] Shane Kumpf edited comment on YARN-8141 at 5/11/18 8:41 PM: {quote}I agree that at this time, it's better to not remove YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS {quote} Fair enough. the original environment variable does make the intent explicit for relative source paths. However, YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS won't work for '/etc/passwd' as originally described in this issue, this is what YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS is for. Can you help me understand the use case in allowing the user to provide YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS? I'm struggling to come up with a good use case for this. was (Author: shaneku...@gmail.com): {quote}I agree that at this time, it's better to not remove YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS {quote} Fair enough. the original environment variable does make the user intent explicit for relative source paths. However, YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS won't work for '/etc/passwd' as originally described in this issue, this is what YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS is for. Can you help me understand the use case in allowing the user to provide YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS? I'm struggling to come up with a good use case for this. > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472621#comment-16472621 ] Shane Kumpf commented on YARN-8141: --- {quote}I agree that at this time, it's better to not remove YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS {quote} Fair enough. the original environment variable does make the user intent explicit for relative source paths. However, YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS won't work for '/etc/passwd' as originally described in this issue, this is what YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS is for. Can you help me understand the use case in allowing the user to provide YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS? I'm struggling to come up with a good use case for this. > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472604#comment-16472604 ] Suma Shivaprasad commented on YARN-8080: Thanks [~csingh] Addressed review comments and merged with trunk > YARN native service should support component restart policy > --- > > Key: YARN-8080 > URL: https://issues.apache.org/jira/browse/YARN-8080 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8080.001.patch, YARN-8080.002.patch, > YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, > YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, > YARN-8080.011.patch > > > Existing native service assumes the service is long running and never > finishes. Containers will be restarted even if exit code == 0. > To support boarder use cases, we need to allow restart policy of component > specified by users. Propose to have following policies: > 1) Always: containers always restarted by framework regardless of container > exit status. This is existing/default behavior. > 2) Never: Do not restart containers in any cases after container finishes: To > support job-like workload (for example Tensorflow training job). If a task > exit with code == 0, we should not restart the task. This can be used by > services which is not restart/recovery-able. > 3) On-failure: Similar to above, only restart task with exitcode != 0. > Behaviors after component *instance* finalize (Succeeded or Failed when > restart_policy != ALWAYS): > 1) For single component, single instance: complete service. > 2) For single component, multiple instance: other running instances from the > same component won't be affected by the finalized component instance. Service > will be terminated once all instances finalized. > 3) For multiple components: Service will be terminated once all components > finalized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8080: --- Attachment: YARN-8080.011.patch > YARN native service should support component restart policy > --- > > Key: YARN-8080 > URL: https://issues.apache.org/jira/browse/YARN-8080 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8080.001.patch, YARN-8080.002.patch, > YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, > YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, > YARN-8080.011.patch > > > Existing native service assumes the service is long running and never > finishes. Containers will be restarted even if exit code == 0. > To support boarder use cases, we need to allow restart policy of component > specified by users. Propose to have following policies: > 1) Always: containers always restarted by framework regardless of container > exit status. This is existing/default behavior. > 2) Never: Do not restart containers in any cases after container finishes: To > support job-like workload (for example Tensorflow training job). If a task > exit with code == 0, we should not restart the task. This can be used by > services which is not restart/recovery-able. > 3) On-failure: Similar to above, only restart task with exitcode != 0. > Behaviors after component *instance* finalize (Succeeded or Failed when > restart_policy != ALWAYS): > 1) For single component, single instance: complete service. > 2) For single component, multiple instance: other running instances from the > same component won't be affected by the finalized component instance. Service > will be terminated once all instances finalized. > 3) For multiple components: Service will be terminated once all components > finalized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container
[ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472597#comment-16472597 ] Jason Lowe commented on YARN-7654: -- Thanks for updating the patch! AbstractLauncher still has a setCommand method which is no longer necessary. DockerProviderService should call addCommand instead of setCommand. The only other place commands can be added to AbstractLauncher is in AbstractProviderService#buildLaunchCommand which DockerProviderService overrides. The try-with-resources was added to only one of the writeCommandToTempFile methods (yes, oddly there are two), and the one that matters was not the one that was updated. As a result the OutputStreamWriter is not closed if anything throws, and the printWriter is not closed if some exceptions are thrown. My apologies for missing this earlier. The clue was only one of them is checking for DockerRunCommand instances. My previous comment on putAll was misconstrued. I wasn't asking for DockerRunCommand to have a putAll method, rather that the addEnv method should be implemented in terms of userEnv.putAll. putAll is not a very useful method name for DockerRunCommand since it's not clear what "all" is. It's not clear that it's environment variables that are being added to the command. What I originally meant was for DockerRunCommand#addEnv to simply be this: {code:java} public final void addEnv(Map environment) { userEnv.putAll(environment); } {code} typo: "use_entry_pont" s/b "use_entry_point" > Support ENTRY_POINT for docker container > > > Key: YARN-7654 > URL: https://issues.apache.org/jira/browse/YARN-7654 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Labels: Docker > Attachments: YARN-7654.001.patch, YARN-7654.002.patch, > YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, > YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, > YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, > YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, > YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, > YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, > YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch > > > Docker image may have ENTRY_POINT predefined, but this is not supported in > the current implementation. It would be nice if we can detect existence of > {{launch_command}} and base on this variable launch docker container in > different ways: > h3. Launch command exists > {code} > docker run [image]:[version] > docker exec [container_id] [launch_command] > {code} > h3. Use ENTRY_POINT > {code} > docker run [image]:[version] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8248) Job hangs when a queue is specified and the maxResources of the queue cannot satisfy the AM resource request
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472598#comment-16472598 ] Haibo Chen commented on YARN-8248: -- Thanks [~snemeth] for updating the patch. I have a few more comments/questions 1) The AMResourceRequests of an application is already verified in RMAppManager. validateAndCreateResourceRequest(). We don't need to check what if it is null, inside fair scheduler any more, IMO. Effectively, what I am suggesting is removing {code:java} if (rmApp == null || rmApp.getAMResourceRequests() == null) { LOG.debug("rmApp or rmApp.AMResourceRequests was null!"); } {code} 2) Isn't Resources.isAnyMajorResourceZero(DOMINANT_RESOURCE_CALCULATOR, queueMaxShare)), already included in !Resources.fitsIn(amResourceRequest.getCapability(),queueMaxShare)? That is, if the queue max resource is 0, Resources.fitsIn(amResourceRequest.getCapability(),queueMaxShare) would always return false. We'd also need to update the diagnostic message accordingly. 3) We don't need to check again in FairScheduler.allocate() because it is always called after the APP is accepted, which would imply the check already passed. 4) It is not clear to me how testAppRejectedToQueueZeroCapacityOfResource() is different from testSchedulingRejectedToQueueZeroCapacityOfResource(). The former case includes the latter, doesn't it? If so, I'd propose we get rid of testSchedulingRejectedToQueueZeroCapacityOfResource() and associated tests. There is one other case not covered in unit tests. What if max Resource of a queue is not zero, but the AM resource request is larger than the maxResource? 5) Some minor issues: There is an unused import in FairScheduler.java; Let's rename processEvents()-> addApplication(), processAttempAddedEvent() -> addAppAttempt(); Some debug messages tend to describe what the code does. Interpreting the debug log without the code aside can be hard. A few suggestions: LOG.debug("Assignment of container on node " + node+ " is zero!"); -> LOG.debug("No container is allocated on node" + node); "Resource ask %s fits in available node resources %s, but the allocated container was null!" -> "Resource ask %s fits in available node resources %s, but no container was allocated" LOG.debug("Assign container precheck was false on node: " + node); -> LOG.debug("Assign container precheck on node " + node + " failed" ); > Job hangs when a queue is specified and the maxResources of the queue cannot > satisfy the AM resource request > > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container
[ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472591#comment-16472591 ] genericqa commented on YARN-7654: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 17s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 11 new + 113 unchanged - 0 fixed = 124 total (was 113) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 4s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 36s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black}
[jira] [Commented] (YARN-8248) Job hangs when a queue is specified and the maxResources of the queue cannot satisfy the AM resource request
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472587#comment-16472587 ] genericqa commented on YARN-8248: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 31s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 246 unchanged - 0 fixed = 247 total (was 246) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m 57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}124m 55s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8248 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923072/YARN-8248-006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8d5e00debded 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c1d64d6 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20704/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20704/testReport/ | | Max. process+thread count | 874 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourc
[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472584#comment-16472584 ] Chandni Singh commented on YARN-8141: - [~shaneku...@gmail.com] I don't see any issues with the approach you mentioned. However, I wanted to avoid making changes to Yarn core which seemed riskier. I can create another ticket to address consolidating {{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} and {{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} for the next release and work on it. Let me know if this sounds good to you? > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8243) Flex down should remove instance with largest component instance ID first
[ https://issues.apache.org/jira/browse/YARN-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472579#comment-16472579 ] Hudson commented on YARN-8243: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14177 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14177/]) YARN-8243. Flex down should remove instance with largest component (billie: rev ca612e353fc3e3766868ec0816de035e48b1f5b4) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceMaster.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/Component.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstance.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestYarnNativeServices.java > Flex down should remove instance with largest component instance ID first > - > > Key: YARN-8243 > URL: https://issues.apache.org/jira/browse/YARN-8243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8243.01.patch, YARN-8243.02.patch > > > This is easy to test on a service with anti-affinity component, to simulate > pending container requests. It can be simulated by other means also (no > resource left in cluster, etc.). > Service yarnfile used to test this - > {code:java} > { > "name": "sleeper-service", > "version": "1", > "components" : > [ > { > "name": "ping", > "number_of_containers": 2, > "resource": { > "cpus": 1, > "memory": "256" > }, > "launch_command": "sleep 9000", > "placement_policy": { > "constraints": [ > { > "type": "ANTI_AFFINITY", > "scope": "NODE", > "target_tags": [ > "ping" > ] > } > ] > } > } > ] > } > {code} > Launch a service with the above yarnfile as below - > {code:java} > yarn app -launch simple-aa-1 simple_AA.json > {code} > Let's assume there are only 5 nodes in this cluster. Now, flex the above > service to 1 extra container than the number of nodes (6 in my case). > {code:java} > yarn app -flex simple-aa-1 -component ping 6 > {code} > Only 5 containers will be allocated and running for simple-aa-1. At this > point, flex it down to 5 containers - > {code:java} > yarn app -flex simple-aa-1 -component ping 5 > {code} > This is what is seen in the serviceam log at this point - > {noformat} > 2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO > service.ClientAMService - Flexing component ping to 5 > 2018-05-03 20:17:38,469 [Component dispatcher] INFO component.Component - > [FLEX DOWN COMPONENT ping]: scaling down from 6 to 5 > 2018-05-03 20:17:38,470 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE ping-4 : > container_1525297086734_0013_01_06]: Flexed down by user, destroying. > 2018-05-03 20:17:38,473 [Component dispatcher] INFO component.Component - > [COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event. > 2018-05-03 20:17:38,474 [pool-5-thread-8] INFO > registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : > container_1525297086734_0013_01_06]: Deleting registry path > /users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-06 > 2018-05-03 20:17:38,476 [Component dispatcher] ERROR component.Component - > [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CHECK_STABLE at STABLE > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache
[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472573#comment-16472573 ] Wangda Tan commented on YARN-8141: -- Thanks [~csingh] / [~shaneku...@gmail.com] for the discussion. I agree that at this time, it's better to not remove YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS. As proposed by Chandni, we should simply merge computed by service framework and whatever provided by user. > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472564#comment-16472564 ] Shane Kumpf edited comment on YARN-8141 at 5/11/18 7:59 PM: {quote}{{AbstractLauncher}} in Yarn service while creating the {{LaunchContext}} doesn't know about the absolute path of the localized resource. {quote} This is true, but that is the case with the current code path as well. Resolving the absolute path to the localized resource is handled on the runtime side. I believe we could still consolidate by having {{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} support both relative and absolute source paths (with the difference in behavior documented). If the source path is relative, it is assumed to be a localized resource and is then passed to validateMount which validates and returns the absolute path to that localized resource. If it an absolute path, the mount is added as-is. Do you see any issues with that approach? Does that still meet your original need [~leftnoteasy]? was (Author: shaneku...@gmail.com): {quote}{{AbstractLauncher}} in Yarn service while creating the {{LaunchContext}} doesn't know about the absolute path of the localized resource. \{quote} This is true, but that is the case with the current code path as well. Resolving the absolute path to the localized resource is handled on the runtime side. I believe we could still consolidate by having {{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} support both relative and absolute source paths (with the difference in behavior documented). If the source path is relative, it is assumed to be a localized resource and is then passed to validateMount which validates and returns the absolute path to that localized resource. If it an absolute path, the mount is added as-is. Do you see any issues with that approach? Does that still meet your original need [~leftnoteasy]? > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472564#comment-16472564 ] Shane Kumpf commented on YARN-8141: --- {quote}{{AbstractLauncher}} in Yarn service while creating the {{LaunchContext}} doesn't know about the absolute path of the localized resource. \{quote} This is true, but that is the case with the current code path as well. Resolving the absolute path to the localized resource is handled on the runtime side. I believe we could still consolidate by having {{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} support both relative and absolute source paths (with the difference in behavior documented). If the source path is relative, it is assumed to be a localized resource and is then passed to validateMount which validates and returns the absolute path to that localized resource. If it an absolute path, the mount is added as-is. Do you see any issues with that approach? Does that still meet your original need [~leftnoteasy]? > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8248) Job hangs when a queue is specified and the maxResources of the queue cannot satisfy the AM resource request
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-8248: - Summary: Job hangs when a queue is specified and the maxResources of the queue cannot satisfy the AM resource request (was: Job hangs when queue is specified and that queue has 0 maxResources of a resource) > Job hangs when a queue is specified and the maxResources of the queue cannot > satisfy the AM resource request > > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472548#comment-16472548 ] Eric Yang commented on YARN-8274: - [~ebadger] Sorry my mistake, I thought the report was for the second patch. With 3.1.1 code freeze on Saturday, it is easy to make mistakes, and I like to get YARN-7654 committed before end of today. YARN-7654 and YARN-8207 are probably left uncommitted for too long, and it is easy to make mistakes to rebase changes that includes logic for other patches including YARN-7973, YARN-8209, YARN-8261, YARN-8064. I recommend to go through YARN-7654 to make sure the rebase was done correctly for those patches. > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8244) TestContainerSchedulerQueuing.testStartMultipleContainers failed
[ https://issues.apache.org/jira/browse/YARN-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472532#comment-16472532 ] Hudson commented on YARN-8244: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14176 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14176/]) YARN-8244. TestContainerSchedulerQueuing.testStartMultipleContainers (jlowe: rev dc912994a1bcb511dfda32a0649cef0c9bdc47d3) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/TestContainerSchedulerQueuing.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java > TestContainerSchedulerQueuing.testStartMultipleContainers failed > - > > Key: YARN-8244 > URL: https://issues.apache.org/jira/browse/YARN-8244 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Jim Brennan >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: YARN-8244.001.patch, YARN-8244.002.patch > > > {code:java} > testStartMultipleContainers(org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing) > Time elapsed: 22.198 s <<< FAILURE! > java.lang.AssertionError: ContainerState is not correct (timedout) > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:344) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:309) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing.testStartMultipleContainers(TestContainerSchedulerQueuing.java:256) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413){code} > {code:java} > 2018-05-03 17:31:35,028 WARN [ContainersLauncher #1] launcher.ContainerLaunch > (ContainerLaunch.java:call(329)) - Failed to launch container. > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.j
[jira] [Updated] (YARN-8248) Job hangs when queue is specified and that queue has 0 maxResources of a resource
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8248: - Summary: Job hangs when queue is specified and that queue has 0 maxResources of a resource (was: Job hangs when queue is specified and that queue has 0 capability of a resource) > Job hangs when queue is specified and that queue has 0 maxResources of a > resource > - > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472520#comment-16472520 ] Giovanni Matteo Fumarola commented on YARN-8275: [~elgoiri] thanks for the comment. I am planning to code everything in Commons to be used from YARN and HDFS. > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472511#comment-16472511 ] Eric Badger commented on YARN-8274: --- [~eyang], I appreciate you getting on and reviewing this JIRA quickly. However, could you give a little bit of time before committing so that other people can take a look? At the very least, you should wait for genericqa to come back. When you committed the latest patch, genericqa had only run on the 1st patch. > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472505#comment-16472505 ] genericqa commented on YARN-8274: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 38m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 38s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 40s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 73m 0s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8274 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923069/YARN-8274.002.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 533c7bbe3e6b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3a93af7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/20703/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20703/testReport/ | | Max. process+thread count | 289 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20703/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > >
[jira] [Commented] (YARN-8268) Fair scheduler: reservable queue is configured both as parent and leaf queue
[ https://issues.apache.org/jira/browse/YARN-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472502#comment-16472502 ] Hudson commented on YARN-8268: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14175 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14175/]) YARN-8268. Fair scheduler: reservable queue is configured both as parent (haibochen: rev 1f10a360219c91ac13d31bdb5c8d302b1b45afc3) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java > Fair scheduler: reservable queue is configured both as parent and leaf queue > > > Key: YARN-8268 > URL: https://issues.apache.org/jira/browse/YARN-8268 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8268.000.patch, YARN-8268.001.patch > > > The following allocation file > {code:java} > > > > someuser > > someuser > > > > > someuser > > > drf > > {code} > is being parsed as: {{PARENT=[root, root.dedicated], LEAF=[root.default, > root.dedicated]}} (AllocationConfiguration.configuredQueues). > The root.dedicated should only appear as a PARENT queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8244) TestContainerSchedulerQueuing.testStartMultipleContainers failed
[ https://issues.apache.org/jira/browse/YARN-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472493#comment-16472493 ] Jason Lowe commented on YARN-8244: -- Thanks for updating the patch! +1 lgtm. Committing this. > TestContainerSchedulerQueuing.testStartMultipleContainers failed > - > > Key: YARN-8244 > URL: https://issues.apache.org/jira/browse/YARN-8244 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-8244.001.patch, YARN-8244.002.patch > > > {code:java} > testStartMultipleContainers(org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing) > Time elapsed: 22.198 s <<< FAILURE! > java.lang.AssertionError: ContainerState is not correct (timedout) > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:344) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:309) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing.testStartMultipleContainers(TestContainerSchedulerQueuing.java:256) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413){code} > {code:java} > 2018-05-03 17:31:35,028 WARN [ContainersLauncher #1] launcher.ContainerLaunch > (ContainerLaunch.java:call(329)) - Failed to launch container. > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) > at java.util.HashMap$EntryIterator.next(HashMap.java:1471) > at java.util.HashMap$EntryIterator.next(HashMap.java:1469) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch$ShellScriptBuilder.orderEnvByDependencies(ContainerLaunch.java:1311) > at > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.writeLaunchEnv(ContainerExecutor.java:388) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:290) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472477#comment-16472477 ] Hudson commented on YARN-8274: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14174 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14174/]) YARN-8274. Fixed a bug on docker start command.Contributed (eyang: rev 8f7912e0fee5de608ce8824fa5bd81b01b9a7c38) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472476#comment-16472476 ] Íñigo Goiri commented on YARN-8275: --- Even though the main use should be in YARN for now, we should do the changes in Commons and eventually plug this in HDFS too. > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472460#comment-16472460 ] Giovanni Matteo Fumarola commented on YARN-8275: The attached file [^WinUtils-Functions.pdf] shows the current usage (inputs and outputs) of all the WinUtils functions. We should design a JNI interface aligned with it. > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8275: --- Attachment: WinUtils-Functions.pdf > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472450#comment-16472450 ] Chandni Singh edited comment on YARN-8141 at 5/11/18 6:36 PM: -- [~shaneku...@gmail.com] Thanks for pointing out the issue. {quote}As a result, the "source" of the mounts added in {{AbstractLauncher}} are all relative paths. These need to be specified as absolute paths in the final mount that is added to the {{docker run}} command {quote} I think it may be better not to get rid of {{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} at this point because {{AbstractLauncher}} in Yarn service while creating the {{LaunchContext}} doesn't know about the absolute path of the localized resource. I can put a simple fix to merge user specified {{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} with the once computed by yarn service for the issue reported by [~leftnoteasy] Let me know your thoughts? was (Author: csingh): [~shaneku...@gmail.com] Thanks for pointing out the issue. {quote}As a result, the "source" of the mounts added in {{AbstractLauncher}} are all relative paths. These need to be specified as absolute paths in the final mount that is added to the {{docker run}} command {quote} I think it may be better not to get rid of {{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} at this point because {{AbstractLauncher}} in Yarn service while creating the {{LaunchContext}} doesn't know about the absolute path of the localized resource. I can put a simple fix to merge user specified {{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} with the once computed by yarn service. Let me know your thoughts? > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8268) Fair scheduler: reservable queue is configured both as parent and leaf queue
[ https://issues.apache.org/jira/browse/YARN-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472452#comment-16472452 ] Haibo Chen commented on YARN-8268: -- Thanks [~grepas] for the contribution, [~wilfreds] for the additional review. I have checked in the patch to trunk > Fair scheduler: reservable queue is configured both as parent and leaf queue > > > Key: YARN-8268 > URL: https://issues.apache.org/jira/browse/YARN-8268 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8268.000.patch, YARN-8268.001.patch > > > The following allocation file > {code:java} > > > > someuser > > someuser > > > > > someuser > > > drf > > {code} > is being parsed as: {{PARENT=[root, root.dedicated], LEAF=[root.default, > root.dedicated]}} (AllocationConfiguration.configuredQueues). > The root.dedicated should only appear as a PARENT queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472450#comment-16472450 ] Chandni Singh commented on YARN-8141: - [~shaneku...@gmail.com] Thanks for pointing out the issue. {quote}As a result, the "source" of the mounts added in {{AbstractLauncher}} are all relative paths. These need to be specified as absolute paths in the final mount that is added to the {{docker run}} command {quote} I think it may be better not to get rid of {{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} at this point because {{AbstractLauncher}} in Yarn service while creating the {{LaunchContext}} doesn't know about the absolute path of the localized resource. I can put a simple fix to merge user specified {{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} with the once computed by yarn service. Let me know your thoughts? > YARN Native Service: Respect > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec > -- > > Key: YARN-8141 > URL: https://issues.apache.org/jira/browse/YARN-8141 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8141.001.patch, YARN-8141.002.patch, > YARN-8141.003.patch > > > Existing YARN native service overwrites > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user > specified this in service spec or not. It is important to allow user to mount > local folders like /etc/passwd, etc. > Following logic overwrites the > YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment: > {code:java} > StringBuilder sb = new StringBuilder(); > for (Entry mount : mountPaths.entrySet()) { > if (sb.length() > 0) { > sb.append(","); > } > sb.append(mount.getKey()); > sb.append(":"); > sb.append(mount.getValue()); > } > env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", > sb.toString());{code} > Inside AbstractLauncher.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8272) Several items are missing from Hadoop 3.1.0 documentation
[ https://issues.apache.org/jira/browse/YARN-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-8272. -- Resolution: Duplicate Closing as dup of HADOOP-15374 > Several items are missing from Hadoop 3.1.0 documentation > - > > Key: YARN-8272 > URL: https://issues.apache.org/jira/browse/YARN-8272 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Wangda Tan >Priority: Blocker > > From what I can see there're several missing items like GPU / FPGA: > http://hadoop.apache.org/docs/current/ > We should add them to hadoop-project/src/site/site.xml in the next release. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8272) Several items are missing from Hadoop 3.1.0 documentation
[ https://issues.apache.org/jira/browse/YARN-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472447#comment-16472447 ] Wangda Tan commented on YARN-8272: -- Oh sweet! Thanks [~ajisakaa] and [~tasanuma0829]! > Several items are missing from Hadoop 3.1.0 documentation > - > > Key: YARN-8272 > URL: https://issues.apache.org/jira/browse/YARN-8272 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Wangda Tan >Priority: Blocker > > From what I can see there're several missing items like GPU / FPGA: > http://hadoop.apache.org/docs/current/ > We should add them to hadoop-project/src/site/site.xml in the next release. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8258) [UI2] New UI webappcontext should inherit all filters from default context
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8258: - Reporter: Sumana Sathish (was: Sunil G) > [UI2] New UI webappcontext should inherit all filters from default context > -- > > Key: YARN-8258 > URL: https://issues.apache.org/jira/browse/YARN-8258 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Reporter: Sumana Sathish >Assignee: Sunil G >Priority: Major > Attachments: YARN-8258.001.patch > > > Thanks [~ssath...@hortonworks.com] for finding this. > Ideally all filters from default context has to be inherited to UI2 context > as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8280) [UI2] GPU is not present in the drop down box under 'Nodes Heatmap'
[ https://issues.apache.org/jira/browse/YARN-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8280: - Reporter: Sumana Sathish (was: Gergely Novák) > [UI2] GPU is not present in the drop down box under 'Nodes Heatmap' > --- > > Key: YARN-8280 > URL: https://issues.apache.org/jira/browse/YARN-8280 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Sumana Sathish >Assignee: Gergely Novák >Priority: Major > Attachments: YARN-8280.001.patch > > > 1. Click on Nodes Heatmap Chart under Nodes Tab. > 2. No option to select GPU is available under the drop down menu. > Discovered by [~ssath...@hortonworks.com]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8268) Fair scheduler: reservable queue is configured both as parent and leaf queue
[ https://issues.apache.org/jira/browse/YARN-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472423#comment-16472423 ] Haibo Chen commented on YARN-8268: -- +1. Checking this in shortly. > Fair scheduler: reservable queue is configured both as parent and leaf queue > > > Key: YARN-8268 > URL: https://issues.apache.org/jira/browse/YARN-8268 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8268.000.patch, YARN-8268.001.patch > > > The following allocation file > {code:java} > > > > someuser > > someuser > > > > > someuser > > > drf > > {code} > is being parsed as: {{PARENT=[root, root.dedicated], LEAF=[root.default, > root.dedicated]}} (AllocationConfiguration.configuredQueues). > The root.dedicated should only appear as a PARENT queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8274: Fix Version/s: 3.1.1 3.2.0 > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications
[ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472408#comment-16472408 ] genericqa commented on YARN-8130: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 19s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8130 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923057/YARN-8130.03.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux eaf463b014d7 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d50c4d7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20702/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/20702/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hado
[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container
[ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472407#comment-16472407 ] Eric Yang commented on YARN-7654: - [~jlowe] Patch 23 includes all your suggestions. > Support ENTRY_POINT for docker container > > > Key: YARN-7654 > URL: https://issues.apache.org/jira/browse/YARN-7654 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Labels: Docker > Attachments: YARN-7654.001.patch, YARN-7654.002.patch, > YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, > YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, > YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, > YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, > YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, > YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, > YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch > > > Docker image may have ENTRY_POINT predefined, but this is not supported in > the current implementation. It would be nice if we can detect existence of > {{launch_command}} and base on this variable launch docker container in > different ways: > h3. Launch command exists > {code} > docker run [image]:[version] > docker exec [container_id] [launch_command] > {code} > h3. Use ENTRY_POINT > {code} > docker run [image]:[version] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7654) Support ENTRY_POINT for docker container
[ https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7654: Attachment: YARN-7654.023.patch > Support ENTRY_POINT for docker container > > > Key: YARN-7654 > URL: https://issues.apache.org/jira/browse/YARN-7654 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Blocker > Labels: Docker > Attachments: YARN-7654.001.patch, YARN-7654.002.patch, > YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, > YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, > YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, > YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, > YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, > YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, > YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch > > > Docker image may have ENTRY_POINT predefined, but this is not supported in > the current implementation. It would be nice if we can detect existence of > {{launch_command}} and base on this variable launch docker container in > different ways: > h3. Launch command exists > {code} > docker run [image]:[version] > docker exec [container_id] [launch_command] > {code} > h3. Use ENTRY_POINT > {code} > docker run [image]:[version] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8248) Job hangs when queue is specified and that queue has 0 capability of a resource
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472399#comment-16472399 ] Szilard Nemeth commented on YARN-8248: -- Thanks [~haibochen] for your answers. It makes sense now why you wanted to remove changes from Resources. Also changed the scope of the debug logs and just kept those that are "edge" cases and come up more rarely. About the 3rd point: I checked the writeLock's scope but it cannod be reduced since the following 2 lines should be called when lock was called on the writeLock: {code:java} RMApp rmApp = rmContext.getRMApps().get(applicationId); FSLeafQueue queue = assignToQueue(rmApp, queueName, user); {code} Please check my updated patch! Thanks! > Job hangs when queue is specified and that queue has 0 capability of a > resource > --- > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8281) [Java] Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8281: --- Description: This JIRA tracks the design/implementation of the Java layer for the JNI interface to interact with Windows. > [Java] Create a JNI interface to interact with Windows > -- > > Key: YARN-8281 > URL: https://issues.apache.org/jira/browse/YARN-8281 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > > This JIRA tracks the design/implementation of the Java layer for the JNI > interface to interact with Windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8248) Job hangs when queue is specified and that queue has 0 capability of a resource
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8248: - Attachment: YARN-8248-006.patch > Job hangs when queue is specified and that queue has 0 capability of a > resource > --- > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8281) [Java] Create a JNI interface to interact with Windows
Giovanni Matteo Fumarola created YARN-8281: -- Summary: [Java] Create a JNI interface to interact with Windows Key: YARN-8281 URL: https://issues.apache.org/jira/browse/YARN-8281 Project: Hadoop YARN Issue Type: Sub-task Reporter: Giovanni Matteo Fumarola Assignee: Giovanni Matteo Fumarola -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472392#comment-16472392 ] Eric Yang commented on YARN-8274: - Sorry the code was missed during refactoring. +1 The change looks good. > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications
[ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472387#comment-16472387 ] Haibo Chen commented on YARN-8130: -- +1 pending jenkins. > Race condition when container events are published for KILLED applications > -- > > Key: YARN-8130 > URL: https://issues.apache.org/jira/browse/YARN-8130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Reporter: Charan Hebri >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-8130.01.patch, YARN-8130.02.patch, > YARN-8130.03.patch > > > There seems to be a race condition happening when an application is KILLED > and the corresponding container event information is being published. For > completed containers, a YARN_CONTAINER_FINISHED event is generated but for > some containers in a KILLED application this information is missing. Below is > a node manager log snippet, > {code:java} > 2018-04-09 08:44:54,474 INFO shuffle.ExternalShuffleBlockResolver > (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application > application_1523259757659_0003 removed, cleanupLocalDirs = false > 2018-04-09 08:44:54,478 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1523259757659_0003 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher > (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been > removed before the entity could be published for > TimelineEntity[type='YARN_CONTAINER', > id='container_1523259757659_0003_01_02'] > 2018-04-09 08:44:54,478 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just > finished : application_1523259757659_0003 > 2018-04-09 08:44:54,488 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_01. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:54,492 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_02. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:55,470 INFO collector.TimelineCollectorManager > (TimelineCollectorManager.java:remove(192)) - The collector service for > application_1523259757659_0003 was removed > 2018-04-09 08:44:55,472 INFO containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1572)) - couldn't find application > application_1523259757659_0003 while processing FINISH_APPS event. The > ResourceManager allocated resources for this application to the NodeManager > but no active containers were found to process{code} > The container id specified in the log, > *container_1523259757659_0003_01_02* is the one that has the finished > event missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472371#comment-16472371 ] genericqa commented on YARN-8274: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 37m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 18s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 53s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8274 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923054/YARN-8274.001.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 47161a3a09cf 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a922b9c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/20701/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20701/testReport/ | | Max. process+thread count | 335 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20701/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with
[jira] [Updated] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8274: - Attachment: YARN-8274.002.patch > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-8274.001.patch, YARN-8274.002.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472345#comment-16472345 ] Jason Lowe commented on YARN-8274: -- I think I found the bug with the container relaunch and the unrecognized command. get_docker_start_command is not adding the docker binary to the first argument, so argv[0] == "start" rather than "/usr/bin/docker" and docker is looking at argv[1] to determine the command name, and that's a container ID rather than a docker command. I'll update the patch shortly. > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-8274.001.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8274) Docker command error during container relaunch
[ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472343#comment-16472343 ] Billie Rinaldi commented on YARN-8274: -- Okay, it does look like my container is properly relaunched when YARN-8207 isn't applied. > Docker command error during container relaunch > -- > > Key: YARN-8274 > URL: https://issues.apache.org/jira/browse/YARN-8274 > Project: Hadoop YARN > Issue Type: Task >Reporter: Billie Rinaldi >Assignee: Jason Lowe >Priority: Critical > Attachments: YARN-8274.001.patch > > > I initiated container relaunch with a "sleep 60; exit 1" launch command and > saw a "not a docker command" error on relaunch. Haven't figured out why this > is happening, but it seems like it has been introduced recently to > trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger] > {noformat} > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Relaunch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1525897486447_0003_01_02 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7 > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception > message: Relaunch container failed > 2018-05-09 21:41:46,631 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error > output: docker: 'container_1525897486447_0003_01_02' is not a docker > command. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8248) Job hangs when queue is specified and that queue has 0 capability of a resource
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472335#comment-16472335 ] Haibo Chen commented on YARN-8248: -- Thanks [~snemeth] for the clarification. I think in general it's okay for fix some minor code/style cleanup around the same area while working on a patch. It becomes however, overhead if the minor changes cause confusion, or is to remote to your core change. Folks reading the commit history would also have questions, without digging into the Jira discussion. Hence, I'd prefer, in this case, to leave Resources as is. If there are many cleanup issues you can find, a separate patch is justifiable. I agree with you debug logs help debugging, especially on the unhappy/abnormal code path. However, if we add too much logging on the happy hot code path, the logs will be flooded. > Job hangs when queue is specified and that queue has 0 capability of a > resource > --- > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org