[jira] [Commented] (YARN-8133) Doc link broken for yarn-service from overview page.
[ https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431753#comment-16431753 ] genericqa commented on YARN-8133: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 33m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 30s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 45m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-8133 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918311/YARN-8133.02.patch | | Optional Tests | asflicense mvnsite | | uname | Linux c31446c34a67 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0006346 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 409 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20283/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Doc link broken for yarn-service from overview page. > > > Key: YARN-8133 > URL: https://issues.apache.org/jira/browse/YARN-8133 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-8133.01.patch, YARN-8133.02.patch > > > I see that documentation link broken from overview page. > Any link clicking from > http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html > page causing an error. > It looks like Overview page, redirecting with .md page which doesn't exist. > It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8137) Parallelize node addition in SLS
[ https://issues.apache.org/jira/browse/YARN-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431712#comment-16431712 ] genericqa commented on YARN-8137: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 17s{color} | {color:green} hadoop-sls in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 59m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-8137 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918307/YARN-8137.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ce62db1c3cdf 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0006346 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20282/testReport/ | | Max. process+thread count | 469 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20282/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Parallelize node addition in SLS > > >
[jira] [Updated] (YARN-7930) Add configuration to initialize RM with configured labels.
[ https://issues.apache.org/jira/browse/YARN-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-7930: Attachment: YARN-7930.004.patch > Add configuration to initialize RM with configured labels. > -- > > Key: YARN-7930 > URL: https://issues.apache.org/jira/browse/YARN-7930 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-7930.001.patch, YARN-7930.002.patch, > YARN-7930.003.patch, YARN-7930.004.patch > > > At present, the only way to create labels is using admin API. Sometimes, > there is a requirement to start the cluster with pre-configured node labels. > This Jira introduces yarn configurations to start RM with predefined node > labels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8126) [Follow up] Support auto-spawning of admin configured services during bootstrap of rm
[ https://issues.apache.org/jira/browse/YARN-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431705#comment-16431705 ] Rohith Sharma K S commented on YARN-8126: - bq. If I understand correctly yarn.service.system-service.dir is a cluster-specific config, right? Yes, How about adding new section Quick start? > [Follow up] Support auto-spawning of admin configured services during > bootstrap of rm > - > > Key: YARN-8126 > URL: https://issues.apache.org/jira/browse/YARN-8126 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-8126.001.patch > > > YARN-8048 adds support auto-spawning of admin configured services during > bootstrap of rm. > This JIRA is to follow up some of the comments discussed in YARN-8048. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services
[ https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431701#comment-16431701 ] Eric Yang commented on YARN-7530: - [~leftnoteasy] I think we would like to keep yarn-service-api inside of yarn-application/yarn-service subtree instead of trying to separate the project into various part of YARN. This will ensure that yarn-service is a kind of YARN application, and it is completely optional from YARN point of view. This will be easier to develop instead of going 5 different sub-projects to change code. > hadoop-yarn-services-api should be part of hadoop-yarn-services > --- > > Key: YARN-7530 > URL: https://issues.apache.org/jira/browse/YARN-7530 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Chandni Singh >Priority: Trivial > Fix For: yarn-native-services > > Attachments: YARN-7530.001.patch > > > Hadoop-yarn-services-api is currently a parallel project to > hadoop-yarn-services project. It would be better if hadoop-yarn-services-api > is part of hadoop-yarn-services for correctness. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8133) Doc link broken for yarn-service from overview page.
[ https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-8133: Attachment: YARN-8133.02.patch > Doc link broken for yarn-service from overview page. > > > Key: YARN-8133 > URL: https://issues.apache.org/jira/browse/YARN-8133 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-8133.01.patch, YARN-8133.02.patch > > > I see that documentation link broken from overview page. > Any link clicking from > http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html > page causing an error. > It looks like Overview page, redirecting with .md page which doesn't exist. > It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8100) Support API interface to query cluster attributes and attribute to nodes
[ https://issues.apache.org/jira/browse/YARN-8100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431690#comment-16431690 ] Bibin A Chundatt commented on YARN-8100: Thanks [~Naganarasimha] for review and commit > Support API interface to query cluster attributes and attribute to nodes > > > Key: YARN-8100 > URL: https://issues.apache.org/jira/browse/YARN-8100 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Fix For: YARN-3409 > > Attachments: YARN-8100-YARN-3409.001.patch, > YARN-8100-YARN-3409.002.patch, YARN-8100-YARN-3409.003.patch, > YARN-8100-YARN-3409.004.patch, YARN-8100-YARN-3409.005.patch, > YARN-8100-YARN-3409.006.patch, YARN-8100-YARN-3409.007.patch > > > Jira is to add api to queue cluster node attributes and Attributes to node > query > *YarnClient* > {code} > getAttributesToNodes() > getAttributesToNodes(Set attribute) > getClusterAttributes() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431680#comment-16431680 ] Arun Suresh commented on YARN-8135: --- Interesting! - would like to help out.. awaiting design doc.. Think this should be renamed to YARN-Submarine though. > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8137) Parallelize node addition in SLS
[ https://issues.apache.org/jira/browse/YARN-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-8137: Issue Type: Sub-task (was: Bug) Parent: YARN-5065 > Parallelize node addition in SLS > > > Key: YARN-8137 > URL: https://issues.apache.org/jira/browse/YARN-8137 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-8137.001.patch > > > Right now, nodes are added sequentially and it can take a long time if there > are large number of nodes. With this change nodes will be added in parallel > and thus reduce the node addition time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8137) Parallelize node addition in SLS
[ https://issues.apache.org/jira/browse/YARN-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-8137: Attachment: YARN-8137.001.patch > Parallelize node addition in SLS > > > Key: YARN-8137 > URL: https://issues.apache.org/jira/browse/YARN-8137 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-8137.001.patch > > > Right now, nodes are added sequentially and it can take a long time if there > are large number of nodes. With this change nodes will be added in parallel > and thus reduce the node addition time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8137) Parallelize node addition in SLS
Abhishek Modi created YARN-8137: --- Summary: Parallelize node addition in SLS Key: YARN-8137 URL: https://issues.apache.org/jira/browse/YARN-8137 Project: Hadoop YARN Issue Type: Bug Reporter: Abhishek Modi Assignee: Abhishek Modi Right now, nodes are added sequentially and it can take a long time if there are large number of nodes. With this change nodes will be added in parallel and thus reduce the node addition time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before
[ https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431661#comment-16431661 ] Tao Yang commented on YARN-6629: Hi, [~hunanmei...@gmail.com]. Yes, it's the same question. Attached new patch for branch-2 which also can be cleanly applied to branch-2.9 and branch-2.9.0. The new patch is nearly the same with trunk. [~leftnoteasy], please help to review and commit. Thanks! > NPE occurred when container allocation proposal is applied but its resource > requests are removed before > --- > > Key: YARN-6629 > URL: https://issues.apache.org/jira/browse/YARN-6629 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.1.0 > > Attachments: YARN-6629.001.patch, YARN-6629.002.patch, > YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, > YARN-6629.006.patch, YARN-6629.branch-2.001.patch > > > I wrote a test case to reproduce another problem for branch-2 and found new > NPE error, log: > {code} > FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in > handling event type NODE_UPDATE to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516) > at > org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225) > at > org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31) > at org.mockito.internal.MockHandler.handle(MockHandler.java:97) > at > org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply() > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:745) > {code} > Reproduce this error in chronological order: > 1. AM started and requested 1 container with schedulerRequestKey#1 : > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests > Added schedulerRequestKey#1 into schedulerKeyToPlacementSets > 2. Scheduler allocatd 1 container for this request and accepted the proposal > 3. AM removed this request > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests --> > AppSchedulingInfo#addToPlacementSets --> > AppSchedulingInfo#updatePendingResources > Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets) > 4. Scheduler applied this proposal > CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> > AppSchedulingInfo#allocate > Throw NPE when called > schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, > type, node); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To
[jira] [Updated] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before
[ https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-6629: --- Attachment: YARN-6629.branch-2.001.patch > NPE occurred when container allocation proposal is applied but its resource > requests are removed before > --- > > Key: YARN-6629 > URL: https://issues.apache.org/jira/browse/YARN-6629 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.1.0 > > Attachments: YARN-6629.001.patch, YARN-6629.002.patch, > YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, > YARN-6629.006.patch, YARN-6629.branch-2.001.patch > > > I wrote a test case to reproduce another problem for branch-2 and found new > NPE error, log: > {code} > FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in > handling event type NODE_UPDATE to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516) > at > org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225) > at > org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31) > at org.mockito.internal.MockHandler.handle(MockHandler.java:97) > at > org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply() > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:745) > {code} > Reproduce this error in chronological order: > 1. AM started and requested 1 container with schedulerRequestKey#1 : > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests > Added schedulerRequestKey#1 into schedulerKeyToPlacementSets > 2. Scheduler allocatd 1 container for this request and accepted the proposal > 3. AM removed this request > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests --> > AppSchedulingInfo#addToPlacementSets --> > AppSchedulingInfo#updatePendingResources > Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets) > 4. Scheduler applied this proposal > CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> > AppSchedulingInfo#allocate > Throw NPE when called > schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, > type, node); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8126) [Follow up] Support auto-spawning of admin configured services during bootstrap of rm
[ https://issues.apache.org/jira/browse/YARN-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431607#comment-16431607 ] Gour Saha commented on YARN-8126: - [~rohithsharma] the patch looks good. Few minor comments - h5. SystemServiceManagerImpl.java getbadDirSkipCounter make b in bad uppercase h5. Configurations.md All service AM specific configs go here. If I understand correctly {{yarn.service.system-service.dir}} is a cluster-specific config, right? Also, thanks for deleting TestSystemServiceManager.java which had all upgrade specific tests. I think I missed this in my first round review :) > [Follow up] Support auto-spawning of admin configured services during > bootstrap of rm > - > > Key: YARN-8126 > URL: https://issues.apache.org/jira/browse/YARN-8126 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-8126.001.patch > > > YARN-8048 adds support auto-spawning of admin configured services during > bootstrap of rm. > This JIRA is to follow up some of the comments discussed in YARN-8048. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7142) Support placement policy in yarn native services
[ https://issues.apache.org/jira/browse/YARN-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429610#comment-16429610 ] Weiwei Yang edited comment on YARN-7142 at 4/10/18 2:08 AM: Hi [~gsaha]/[~leftnoteasy] Thanks for backing port this to branch-3.1. Not related to this task, I have a question about the format of placement policy in yaml file. It looks like it is more like an interpretation of how we specify placement constraints using Java API. I think we should be able to support a simple PC language, by specifying something like: {code:java} notin,node,foo {code} see more in [this doc|https://issues.apache.org/jira/secure/attachment/12911872/Placement%20Constraint%20Expression%20Syntax%20Specification.pdf] in YARN-7921. I know this is only used distributed shell as a demo, but I think if we find this more easier to write, maybe we can use such expression here too? Just want to know your opinion. Thanks was (Author: cheersyang): Hi [~gsaha]/[~leftnoteasy] Thanks for backing port this to branch-3.1. Not related to this task, I have a question about the format of placement policy in yaml file. It looks like it is more like an interpretation of how we specify placement constraints using Java API. I think we should be able to support a simple PC language, by specifying something like: {code:java} notin,node,foo {code} see more in [^Placement Constraint Expression Syntax Specification.pdf] in YARN-7921. I know this is only used distributed shell as a demo, but I think if we find this more easier to write, maybe we can use such expression here too? Just want to know your opinion. Thanks > Support placement policy in yarn native services > > > Key: YARN-7142 > URL: https://issues.apache.org/jira/browse/YARN-7142 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7142-branch-3.1.004.patch, YARN-7142.001.patch, > YARN-7142.002.patch, YARN-7142.003.patch, YARN-7142.004.patch > > > Placement policy exists in the API but is not implemented yet. > I have filed YARN-8074 to move the composite constraints implementation out > of this phase-1 implementation of placement policy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7088) Fix application start time and add submit time to UIs
[ https://issues.apache.org/jira/browse/YARN-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kanwaljeet Sachdev updated YARN-7088: - Attachment: YARN-7088.014.patch > Fix application start time and add submit time to UIs > - > > Key: YARN-7088 > URL: https://issues.apache.org/jira/browse/YARN-7088 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Abdullah Yousufi >Assignee: Kanwaljeet Sachdev >Priority: Major > Attachments: YARN-7088.001.patch, YARN-7088.002.patch, > YARN-7088.003.patch, YARN-7088.004.patch, YARN-7088.005.patch, > YARN-7088.006.patch, YARN-7088.007.patch, YARN-7088.008.patch, > YARN-7088.009.patch, YARN-7088.010.patch, YARN-7088.011.patch, > YARN-7088.012.patch, YARN-7088.013.patch, YARN-7088.014.patch > > > Currently, the start time in the old and new UI actually shows the app > submission time. There should actually be two different fields; one for the > app's submission and one for its start, as well as the elapsed pending time > between the two. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431580#comment-16431580 ] Keqiu Hu commented on YARN-8135: [~leftnoteasy], {quote}Since tensorflow supports to read HDFS, ideally all platform can support this :). What I meant here is, TF read HDFS needs lots of configurations, and needs some specific optimization / considerations to make HDFS access from Docker container easier. Our on-going prototype covers some of this problem. {quote} I don't think it would be hard to make HDSF access from Docker container hard tho. But it worths mentioning data locality, which is not possible with Kuberflow solution :). Looking forward to the design doc, will comment more later. > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8103) Add CLI interface to query node attributes
[ https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431562#comment-16431562 ] Naganarasimha G R commented on YARN-8103: - [~bibinchundatt], you were right as discussed offline i wanted to put the above comment in YARN-8104, and split seems fine. Also as discussed in the meeting : # Cluster CLI will be listing the cluster attributes # Attributes CLI provides a api to get the mapping of attribute(s) to nodes and the value configured # Node CLI should provide the attributes configured for a node and the values mapped for each of the attribute. Only point of discussion here is Should attribute CLI also have the listing of all cluster Attributes? though duplicate it will be helpful for a user. > Add CLI interface to query node attributes > --- > > Key: YARN-8103 > URL: https://issues.apache.org/jira/browse/YARN-8103 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > > YARN-8100 will add API interface for querying the attributes. CLI interface > for querying node attributes for each nodes and list all attributes in > cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8103) Add CLI interface to query node attributes
[ https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-8103: Description: YARN-8100 will add API interface for querying the attributes. CLI interface for querying node attributes for each nodes and list all attributes in cluster. (was: YARN-8100 will adds API interface for querying the attributes. CLI interface for querying node attributes for each nodes and list all attributes in cluster.) > Add CLI interface to query node attributes > --- > > Key: YARN-8103 > URL: https://issues.apache.org/jira/browse/YARN-8103 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > > YARN-8100 will add API interface for querying the attributes. CLI interface > for querying node attributes for each nodes and list all attributes in > cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8104) Add API to fetch node to attribute mapping
[ https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431559#comment-16431559 ] Naganarasimha G R commented on YARN-8104: - Hi [~bibinchundatt], Now that YARN-8100 is in can you rebase on this ? Also there was one point i thought of adding in 8100 , GetAttributesToNodesResponseProto could have had field attributeToNodes as attributesToNodes. Can you incorporate that here ? > Add API to fetch node to attribute mapping > -- > > Key: YARN-8104 > URL: https://issues.apache.org/jira/browse/YARN-8104 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8104-YARN-3409.001.patch > > > Add node/host to attribute mapping in yarn client API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8118) Better utilize gracefully decommissioning node managers
[ https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431557#comment-16431557 ] Karthik Palaniappan commented on YARN-8118: --- Sure – I think I get the use case you guys are describing – I'm just trying to understand why that's different than option #2 (wait for running containers to finish, then decommission the node immediately after). Is the idea that those 20 minute containers would drain shuffle from decommissioning nodes faster than the 10 minute timeout? So then Jason's comment about gracefully decommissioning on a "sufficiently large cluster" makes sense. So as an admin you just need to set this timeout to enough time to finish in-progress containers, finish the current stage (e.g. the map stage), and at least start all tasks in the next stage (e.g. the reduce stage) to drain shuffle. But you don't necessarily need to wait for the entire application to finish. I still think option #2 and option #3 are both valid secondary use cases, so I'm inclined to make an enum parameter for "graceful decommission strategy". In terms of plumbing the flag through, using XML config is by far the easiest. But I can see an argument that this should be a parameter on a per-decommission-rpc basis. Thoughts? > Better utilize gracefully decommissioning node managers > --- > > Key: YARN-8118 > URL: https://issues.apache.org/jira/browse/YARN-8118 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.8.2 > Environment: * Google Compute Engine (Dataproc) > * Java 8 > * Hadoop 2.8.2 using client-mode graceful decommissioning >Reporter: Karthik Palaniappan >Priority: Major > Attachments: YARN-8118-branch-2.001.patch > > > Proposal design doc with background + details (please comment directly on > doc): > [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7] > tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications > to complete before shutting down, but they cannot run new containers from > those in-progress applications. This is wasteful, particularly in > environments where you are billed by resource usage (e.g. EC2). > Proposal: YARN should schedule containers from in-progress applications on > DECOMMISSIONING nodes, but should still avoid scheduling containers from new > applications. That will make in-progress applications complete faster and let > nodes decommission faster. Overall, this should be cheaper. > I have a working patch without unit tests that's surprisingly just a few real > lines of code (patch 001). If folks are happy with the proposal, I'll write > unit tests and also write a patch targeted at trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8079) Support specify files to be downloaded (localized) before containers launched by YARN
[ https://issues.apache.org/jira/browse/YARN-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8079: - Summary: Support specify files to be downloaded (localized) before containers launched by YARN (was: YARN native service should respect source file of ConfigFile inside Service/Component spec) > Support specify files to be downloaded (localized) before containers launched > by YARN > - > > Key: YARN-8079 > URL: https://issues.apache.org/jira/browse/YARN-8079 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8079.001.patch, YARN-8079.002.patch, > YARN-8079.003.patch, YARN-8079.004.patch, YARN-8079.005.patch > > > Currently, {{srcFile}} is not respected. {{ProviderUtils}} doesn't properly > read srcFile, instead it always construct {{remoteFile}} by using > componentDir and fileName of {{destFile}}: > {code} > Path remoteFile = new Path(compInstanceDir, fileName); > {code} > To me it is a common use case which services have some files existed in HDFS > and need to be localized when components get launched. (For example, if we > want to serve a Tensorflow model, we need to localize Tensorflow model > (typically not huge, less than GB) to local disk. Otherwise launched docker > container has to access HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services
[ https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431548#comment-16431548 ] Wangda Tan commented on YARN-7530: -- A quick proposal for this: - ApiServerClient/ServiceClient -> yarn-client - ApiServer/WebApp -> yarn-server/native-service - hadoop-yarn-services-core/api -> yarn-api/common > hadoop-yarn-services-api should be part of hadoop-yarn-services > --- > > Key: YARN-7530 > URL: https://issues.apache.org/jira/browse/YARN-7530 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Chandni Singh >Priority: Trivial > Fix For: yarn-native-services > > Attachments: YARN-7530.001.patch > > > Hadoop-yarn-services-api is currently a parallel project to > hadoop-yarn-services project. It would be better if hadoop-yarn-services-api > is part of hadoop-yarn-services for correctness. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8037) CGroupsResourceCalculator logs excessive warnings on container relaunch
[ https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431526#comment-16431526 ] Miklos Szegedi commented on YARN-8037: -- Thank you, [~shaneku...@gmail.com]. How about hashing the stack trace of the exception and reporting it only, if it has not been seen before? > CGroupsResourceCalculator logs excessive warnings on container relaunch > --- > > Key: YARN-8037 > URL: https://issues.apache.org/jira/browse/YARN-8037 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Priority: Major > > When a container is relaunched, the old process no longer exists. When using > the {{CGroupsResourceCalculator}} this results in the warning and exception > below being logged every second until the relaunch occurs, which is excessive > and filling up the logs. > {code:java} > 2018-03-16 14:30:33,438 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator: > Failed to parse 12844 > org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the > interim 12844 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457) > Caused by: java.io.FileNotFoundException: > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_02/cpuacct.stat > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.(FileInputStream.java:138) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320) > ... 4 more > 2018-03-16 14:30:33,438 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator: > Failed to parse cgroups > /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.memsw.usage_in_bytes > org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the > interim 12844 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457) > Caused by: java.io.FileNotFoundException: > /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.usage_in_bytes > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.(FileInputStream.java:138) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320) > ... 4 more{code} > We should consider moving the exception to debug to reduce the noise at a > minimum. Alternatively, it may make sense to stop the existing > {{MonitoringThread}} during relaunch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8118) Better utilize gracefully decommissioning node managers
[ https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431524#comment-16431524 ] Robert Kanter commented on YARN-8118: - Thanks for your ideas [~Karthik Palaniappan]. Consider this scenario: You want to gracefully decommission a node with a timeout of 10 minutes. Suppose you have a job that has containers which normally take 20 minutes to run. At this point, we wouldn't want to start any of those containers on that node because they're not going to finish before the decom timeout ends, so they'd just get killed halfway through; instead of running on another node, which would be faster overall. I'm fine with adding an option for the behavior you're describing, but I don't think we can change the default behavior here (it's also not a "bugfix" like your design doc suggests; as [~jlowe], [~djp], and my above scenario show, there are valid use cases for the current behavior). > Better utilize gracefully decommissioning node managers > --- > > Key: YARN-8118 > URL: https://issues.apache.org/jira/browse/YARN-8118 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.8.2 > Environment: * Google Compute Engine (Dataproc) > * Java 8 > * Hadoop 2.8.2 using client-mode graceful decommissioning >Reporter: Karthik Palaniappan >Priority: Major > Attachments: YARN-8118-branch-2.001.patch > > > Proposal design doc with background + details (please comment directly on > doc): > [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7] > tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications > to complete before shutting down, but they cannot run new containers from > those in-progress applications. This is wasteful, particularly in > environments where you are billed by resource usage (e.g. EC2). > Proposal: YARN should schedule containers from in-progress applications on > DECOMMISSIONING nodes, but should still avoid scheduling containers from new > applications. That will make in-progress applications complete faster and let > nodes decommission faster. Overall, this should be cheaper. > I have a working patch without unit tests that's surprisingly just a few real > lines of code (patch 001). If folks are happy with the proposal, I'll write > unit tests and also write a patch targeted at trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""
[ https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431519#comment-16431519 ] genericqa commented on YARN-8116: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 34s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-8116 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918280/YARN-8116.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f07f3df7260c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 907919d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20280/testReport/ | | Max. process+thread count | 341 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20280/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Nodemanager fails with NumberFormatException: For input
[jira] [Commented] (YARN-8100) Support API interface to query cluster attributes and attribute to nodes
[ https://issues.apache.org/jira/browse/YARN-8100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431485#comment-16431485 ] Naganarasimha G R commented on YARN-8100: - Thanks [~bibinchundatt], Latest patch looks good to me will commit it shortly. > Support API interface to query cluster attributes and attribute to nodes > > > Key: YARN-8100 > URL: https://issues.apache.org/jira/browse/YARN-8100 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8100-YARN-3409.001.patch, > YARN-8100-YARN-3409.002.patch, YARN-8100-YARN-3409.003.patch, > YARN-8100-YARN-3409.004.patch, YARN-8100-YARN-3409.005.patch, > YARN-8100-YARN-3409.006.patch, YARN-8100-YARN-3409.007.patch > > > Jira is to add api to queue cluster node attributes and Attributes to node > query > *YarnClient* > {code} > getAttributesToNodes() > getAttributesToNodes(Set attribute) > getClusterAttributes() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8118) Better utilize gracefully decommissioning node managers
[ https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431472#comment-16431472 ] Karthik Palaniappan commented on YARN-8118: --- Not sure I understand your use cases (@Jason/@Junping). For jobs that produce shuffle data (i.e. all Hadoop-ecosystem jobs?), killing a container is just as bad as removing the shuffle it produced. I can imagine a few reasonable scenarios around removing nodes: 1) immediately remove nodes (regular decommissioning) 2) wait for containers to finish, but don't wait until applications finish (scenarios where shuffle doesn't matter) 3) wait for apps to finish and let in-progress apps use decommissioning nodes #1 is regular (forceful) decommissioning. #3 is my proposal – focused at cloud environments with potentially drastic scaling events. #2 makes sense for non-cloud environments where few nodes are being removed at a time. It also makes sense when running jobs that don't produce shuffle output. So if you're willing to tolerate a behavioral change, maybe #2 should be the default, and #3 should be an additional flag (either an XML property or a flag on the graceful decommission request). However, as currently implemented, it seems like graceful decommissioning is the worst of all worlds – wait for apps to finish, but don't let apps use decommissioning nodes. Am I missing something obvious here? I couldn't find anything in the original design docs discussing why it was implemented that way. > Better utilize gracefully decommissioning node managers > --- > > Key: YARN-8118 > URL: https://issues.apache.org/jira/browse/YARN-8118 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.8.2 > Environment: * Google Compute Engine (Dataproc) > * Java 8 > * Hadoop 2.8.2 using client-mode graceful decommissioning >Reporter: Karthik Palaniappan >Priority: Major > Attachments: YARN-8118-branch-2.001.patch > > > Proposal design doc with background + details (please comment directly on > doc): > [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7] > tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications > to complete before shutting down, but they cannot run new containers from > those in-progress applications. This is wasteful, particularly in > environments where you are billed by resource usage (e.g. EC2). > Proposal: YARN should schedule containers from in-progress applications on > DECOMMISSIONING nodes, but should still avoid scheduling containers from new > applications. That will make in-progress applications complete faster and let > nodes decommission faster. Overall, this should be cheaper. > I have a working patch without unit tests that's surprisingly just a few real > lines of code (patch 001). If folks are happy with the proposal, I'll write > unit tests and also write a patch targeted at trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""
[ https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431444#comment-16431444 ] Chandni Singh commented on YARN-8116: - Patch 2 includes also a check in {{NMLeveldbStateStoreService}} for existing empty list that are in the db store. Also added a test for it. > Nodemanager fails with NumberFormatException: For input string: "" > -- > > Key: YARN-8116 > URL: https://issues.apache.org/jira/browse/YARN-8116 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8116.001.patch, YARN-8116.002.patch > > > Steps followed. > 1) Update nodemanager debug delay config > {code} > > yarn.nodemanager.delete.debug-delay-sec > 350 > {code} > 2) Launch distributed shell application multiple times > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn jar > hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 120" > -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > hadoop-yarn-applications-distributedshell-*.jar{code} > 3) restart NM > Nodemanager fails to start with below error. > {code} > {code:title=NM log} > 2018-03-23 21:32:14,437 INFO monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: > true > 2018-03-23 21:32:14,439 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set > as 3600. The logs will be aggregated every 3600 seconds > 2018-03-23 21:32:14,455 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > failed in state INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960) > 2018-03-23 21:32:14,458 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceStop(148)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService > waiting for pending aggregation during exit > 2018-03-23 21:32:14,460 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state > INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at >
[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431442#comment-16431442 ] Hudson commented on YARN-7667: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13947 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13947/]) YARN-7667. Docker Stop grace period should be configurable. Contributed (jlowe: rev 907919d28c1b7e4496d189b46ecbb86a10d41339) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > Docker Stop grace period should be configurable > --- > > Key: YARN-7667 > URL: https://issues.apache.org/jira/browse/YARN-7667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-7667.001.patch, YARN-7667.002.patch, > YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch, > YARN-7667.006.patch > > > {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never > called. So, the stop uses the 10 second default grace period from docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""
[ https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8116: Attachment: YARN-8116.002.patch > Nodemanager fails with NumberFormatException: For input string: "" > -- > > Key: YARN-8116 > URL: https://issues.apache.org/jira/browse/YARN-8116 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8116.001.patch, YARN-8116.002.patch > > > Steps followed. > 1) Update nodemanager debug delay config > {code} > > yarn.nodemanager.delete.debug-delay-sec > 350 > {code} > 2) Launch distributed shell application multiple times > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn jar > hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 120" > -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > hadoop-yarn-applications-distributedshell-*.jar{code} > 3) restart NM > Nodemanager fails to start with below error. > {code} > {code:title=NM log} > 2018-03-23 21:32:14,437 INFO monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: > true > 2018-03-23 21:32:14,439 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set > as 3600. The logs will be aggregated every 3600 seconds > 2018-03-23 21:32:14,455 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > failed in state INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960) > 2018-03-23 21:32:14,458 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceStop(148)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService > waiting for pending aggregation during exit > 2018-03-23 21:32:14,460 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state > INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) > at >
[jira] [Commented] (YARN-7941) Transitive dependencies for component are not resolved
[ https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431439#comment-16431439 ] genericqa commented on YARN-7941: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 45s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 7s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-7941 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918263/YARN-7941.1.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0e277bde1947 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9059376 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20279/testReport/ | | Max. process+thread count | 669 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20279/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Transitive dependencies
[jira] [Commented] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance
[ https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431423#comment-16431423 ] genericqa commented on YARN-7939: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 9 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 52s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 31s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 60 new + 401 unchanged - 2 fixed = 461 total (was 403) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 28m 1s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 3s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 37s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}118m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-7939 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918251/YARN-7939.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux 23c06273ed03 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build
[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431404#comment-16431404 ] genericqa commented on YARN-7667: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 27 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui hadoop-mapreduce-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 13m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 2s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 37m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 25m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 25m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 55s{color} | {color:orange} root: The patch generated 2 new + 1249 unchanged - 1 fixed = 1251 total (was 1250) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 4s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 13s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui hadoop-mapreduce-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 14m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} |
[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431369#comment-16431369 ] Wangda Tan commented on YARN-8135: -- [~oliverhuh...@gmail.com], Thanks for the responses, {quote}what does w/o modification mean ? {quote} Without modification of vanilla TF program in order to run on the framework. {quote}As far as Kubeflow is deployed in the same cluster as Hadoop, Kubeflow should be able to access HDFS, through libhdfs or webhdfs interface? {quote} Since tensorflow supports to read HDFS, ideally all platform can support this :). What I meant here is, TF read HDFS needs lots of configurations, and needs some specific optimization / considerations to make HDFS access from Docker container easier. Our on-going prototype covers some of this problem. {quote}ToS kind of supports GPU scheduling (not isolation) base on memory: if you ask for 1 GPU and a machine has 4 GPU, it asks for total memory * the portion of GPU you asked. {quote} This is not easy for user and cannot guarantee proper isolation, so I didn't put a (√) for ToS. > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5268) DShell AM fails java.lang.InterruptedException
[ https://issues.apache.org/jira/browse/YARN-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen resolved YARN-5268. - Resolution: Cannot Reproduce Release Note: Try to produce the issue on a cluster with the latest code, DShell application completed successfully without any failure with the command provided in the description. Close it as "can not reproduce". > DShell AM fails java.lang.InterruptedException > -- > > Key: YARN-5268 > URL: https://issues.apache.org/jira/browse/YARN-5268 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Zian Chen >Priority: Critical > Labels: oct16-easy > Attachments: YARN-5268.1.patch > > > Distributed Shell AM failed with the following error > {Code} > 16/06/16 11:08:10 INFO impl.NMClientAsyncImpl: NMClient stopped. > 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Application > completed. Signalling finish to RM > 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Diagnostics., > total=16, completed=19, allocated=21, failed=4 > 16/06/16 11:08:10 INFO impl.AMRMClientImpl: Waiting for application to be > successfully unregistered. > 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Application Master > failed. exiting > 16/06/16 11:08:10 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting > for queue > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287) > End of LogType:AppMaster.stderr > {Code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8136) Add version attribute to site doc examples and quickstart
Gour Saha created YARN-8136: --- Summary: Add version attribute to site doc examples and quickstart Key: YARN-8136 URL: https://issues.apache.org/jira/browse/YARN-8136 Project: Hadoop YARN Issue Type: Sub-task Components: site Reporter: Gour Saha version attribute is missing in the following 2 site doc files - src/site/markdown/yarn-service/Examples.md src/site/markdown/yarn-service/QuickStart.md -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431345#comment-16431345 ] Keqiu Hu commented on YARN-8135: 1. what does w/o modification mean ? 2. As far as Kubeflow is deployed in the same cluster as Hadoop, Kubeflow should be able to access HDFS, through libhdfs or webhdfs interface? 3. ToS kind of supports GPU scheduling (not isolation) base on memory: if you ask for 1 GPU and a machine has 4 GPU, it asks for total memory * the portion of GPU you asked. Love the name and the curly braces {:) } > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Description: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can let human to explore deep places. B-) Compare to other projects: !image-2018-04-09-14-44-41-101.png! *Notes:* *GPU Isolation of XLearning project is achieved by patched YARN, which is different from community’s GPU isolation solution. **XLearning needs few modification to read ClusterSpec from env. *References:* - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN] - Spark Deep Learning (Databricks): [https://github.com/databricks/spark-deep-learning] - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] was: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can take human to deep places. B-) Compare to other projects: !image-2018-04-09-14-44-41-101.png! *Notes:* *GPU Isolation of XLearning project is achieved by patched YARN, which is different from community’s GPU isolation solution. **XLearning needs few modification to read ClusterSpec from env. *References:* - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN] - Spark Deep Learning (Databricks): [https://github.com/databricks/spark-deep-learning] - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can let human to explore deep > places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Description: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can take human to deep places. B-) Compare to other projects: !image-2018-04-09-14-44-41-101.png! *Notes:* *GPU Isolation of XLearning project is achieved by patched YARN, which is different from community’s GPU isolation solution. **XLearning needs few modification to read ClusterSpec from env. *References:* - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN] - Spark Deep Learning (Databricks): [https://github.com/databricks/spark-deep-learning] - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] was: Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can take human to deep places. B-) Compare to other projects: !image-2018-04-09-14-35-16-778.png! *Notes:* * GPU Isolation of XLearning project is achieved by patched YARN, which is different from community’s GPU isolation solution. ** XLearning needs few modification to read ClusterSpec from env. *References:* - TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark - TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN - Spark Deep Learning (Databricks): https://github.com/databricks/spark-deep-learning - XLearning (Qihoo360): https://github.com/Qihoo360/XLearning - Kubeflow (Google): https://github.com/kubeflow/kubeflow > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can take human to deep places. B-) > Compare to other projects: > !image-2018-04-09-14-44-41-101.png! > *Notes:* > *GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > **XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark] > - TensorFlowOnYARN (Intel): > [https://github.com/Intel-bigdata/TensorFlowOnYARN] > - Spark Deep Learning (Databricks): > [https://github.com/databricks/spark-deep-learning] > - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning] > - Kubeflow (Google): [https://github.com/kubeflow/kubeflow] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8135: - Attachment: image-2018-04-09-14-44-41-101.png > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png, > image-2018-04-09-14-44-41-101.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can take human to deep places. B-) > Compare to other projects: > !image-2018-04-09-14-35-16-778.png! > *Notes:* > * GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > ** XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark > - TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN > - Spark Deep Learning (Databricks): > https://github.com/databricks/spark-deep-learning > - XLearning (Qihoo360): https://github.com/Qihoo360/XLearning > - Kubeflow (Google): https://github.com/kubeflow/kubeflow -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7512) Support service upgrade via YARN Service API and CLI
[ https://issues.apache.org/jira/browse/YARN-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-7512: Target Version/s: 3.1.1 > Support service upgrade via YARN Service API and CLI > > > Key: YARN-7512 > URL: https://issues.apache.org/jira/browse/YARN-7512 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Gour Saha >Assignee: Chandni Singh >Priority: Major > Fix For: yarn-native-services > > Attachments: _In-Place Upgrade of Long-Running Applications in > YARN_v1.pdf, _In-Place Upgrade of Long-Running Applications in YARN_v2.pdf, > _In-Place Upgrade of Long-Running Applications in YARN_v3.pdf > > > YARN Service API and CLI needs to support service (and containers) upgrade in > line with what Slider supported in SLIDER-787 > (http://slider.incubator.apache.org/docs/slider_specs/application_pkg_upgrade.html) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component
[ https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8081: Target Version/s: 3.1.1 > Yarn Service Upgrade: Add support to upgrade a component > > > Key: YARN-8081 > URL: https://issues.apache.org/jira/browse/YARN-8081 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8052) Move overwriting of service definition during flex to service master
[ https://issues.apache.org/jira/browse/YARN-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8052: Target Version/s: 3.1.1 > Move overwriting of service definition during flex to service master > > > Key: YARN-8052 > URL: https://issues.apache.org/jira/browse/YARN-8052 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > > The overwrite of service definition during flex is done from the > ServiceClient. > During auto finalization of upgrade, the current service definition gets > overwritten as well by the service master. This creates a potential conflict. > Need to move the overwrite of service definition during flex to the > ServiceClient. > Discussed on YARN-8018. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
[ https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431333#comment-16431333 ] Wangda Tan commented on YARN-8135: -- I'm currently working on a design doc and a prototype, will share more details in the next several days. > Hadoop {Submarine} Project: Simple and scalable deployment of deep learning > training / serving jobs on Hadoop > - > > Key: YARN-8135 > URL: https://issues.apache.org/jira/browse/YARN-8135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: image-2018-04-09-14-35-16-778.png > > > Description: > *Goals:* > - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs > on YARN. > - Allow jobs easy access data/models in HDFS and other storages. > - Can launch services to serve Tensorflow/MXNet models. > - Support run distributed Tensorflow jobs with simple configs. > - Support run user-specified Docker images. > - Support specify GPU and other resources. > - Support launch tensorboard if user specified. > - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) > *Why this name?* > - Because Submarine is the only vehicle can take human to deep places. B-) > Compare to other projects: > !image-2018-04-09-14-35-16-778.png! > *Notes:* > * GPU Isolation of XLearning project is achieved by patched YARN, which is > different from community’s GPU isolation solution. > ** XLearning needs few modification to read ClusterSpec from env. > *References:* > - TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark > - TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN > - Spark Deep Learning (Databricks): > https://github.com/databricks/spark-deep-learning > - XLearning (Qihoo360): https://github.com/Qihoo360/XLearning > - Kubeflow (Google): https://github.com/kubeflow/kubeflow -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop
Wangda Tan created YARN-8135: Summary: Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop Key: YARN-8135 URL: https://issues.apache.org/jira/browse/YARN-8135 Project: Hadoop YARN Issue Type: New Feature Reporter: Wangda Tan Assignee: Wangda Tan Attachments: image-2018-04-09-14-35-16-778.png Description: *Goals:* - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on YARN. - Allow jobs easy access data/models in HDFS and other storages. - Can launch services to serve Tensorflow/MXNet models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. - Support launch tensorboard if user specified. - Support customized DNS name for roles (like tensorboard.$user.$domain:6006) *Why this name?* - Because Submarine is the only vehicle can take human to deep places. B-) Compare to other projects: !image-2018-04-09-14-35-16-778.png! *Notes:* * GPU Isolation of XLearning project is achieved by patched YARN, which is different from community’s GPU isolation solution. ** XLearning needs few modification to read ClusterSpec from env. *References:* - TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark - TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN - Spark Deep Learning (Databricks): https://github.com/databricks/spark-deep-learning - XLearning (Qihoo360): https://github.com/Qihoo360/XLearning - Kubeflow (Google): https://github.com/kubeflow/kubeflow -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7941) Transitive dependencies for component are not resolved
[ https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi reassigned YARN-7941: Assignee: Billie Rinaldi > Transitive dependencies for component are not resolved > --- > > Key: YARN-7941 > URL: https://issues.apache.org/jira/browse/YARN-7941 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-7941.1.patch > > > It is observed that transitive dependencies are not resolved as a result one > of the component is started earlier. > Ex : In HBase app, > master is independent component, > regionserver is depends on master. > hbaseclient depends on regionserver, > but I always see that HBaseClient is launched before regionserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7941) Transitive dependencies for component are not resolved
[ https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-7941: - Attachment: YARN-7941.1.patch > Transitive dependencies for component are not resolved > --- > > Key: YARN-7941 > URL: https://issues.apache.org/jira/browse/YARN-7941 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Priority: Major > Attachments: YARN-7941.1.patch > > > It is observed that transitive dependencies are not resolved as a result one > of the component is started earlier. > Ex : In HBase app, > master is independent component, > regionserver is depends on master. > hbaseclient depends on regionserver, > but I always see that HBaseClient is launched before regionserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7189: -- Affects Version/s: 2.9.0 2.8.3 3.0.1 Description: Once the docker run command is executed, the docker container is created unless the return code is 125 meaning that the run command itself failed (https://docs.docker.com/engine/reference/run/#exit-status). Any error that happens after the docker run needs to remove the container during cleanup. {noformat:title=container-executor.c:launch_docker_container_as_user} snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, docker_command); fprintf(LOGFILE, "Launching docker container...\n"); FILE* start_docker = popen(docker_command_with_binary, "r"); {noformat} This is fixed by YARN-5366, which changes how we remove containers. However, that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected was: Once the docker run command is executed, the docker container is created unless the return code is 125 meaning that the run command itself failed (https://docs.docker.com/engine/reference/run/#exit-status). Any error that happens after the docker run needs to remove the container during cleanup. {noformat:title=container-executor.c:launch_docker_container_as_user} snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, docker_command); fprintf(LOGFILE, "Launching docker container...\n"); FILE* start_docker = popen(docker_command_with_binary, "r"); {noformat} > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431317#comment-16431317 ] Shane Kumpf commented on YARN-7667: --- The lastest patch lgtm. > Docker Stop grace period should be configurable > --- > > Key: YARN-7667 > URL: https://issues.apache.org/jira/browse/YARN-7667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7667.001.patch, YARN-7667.002.patch, > YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch, > YARN-7667.006.patch > > > {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never > called. So, the stop uses the 10 second default grace period from docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8018) Yarn Service Upgrade: Add support for initiating service upgrade
[ https://issues.apache.org/jira/browse/YARN-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8018: Target Version/s: 3.1.1 > Yarn Service Upgrade: Add support for initiating service upgrade > > > Key: YARN-8018 > URL: https://issues.apache.org/jira/browse/YARN-8018 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8018.001.patch, YARN-8018.002.patch, > YARN-8018.003.patch, YARN-8018.004.patch, YARN-8018.005.patch, > YARN-8018.006.patch, YARN-8018.007.patch > > > Add support for initiating service upgrade which includes the following main > changes: > # Service API to initiate upgrade > # Persist service version on hdfs > # Start the upgraded version of service -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance
[ https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-7939: Target Version/s: 3.1.1 > Yarn Service Upgrade: add support to upgrade a component instance > -- > > Key: YARN-7939 > URL: https://issues.apache.org/jira/browse/YARN-7939 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-7939.001.patch, YARN-7939.002.patch, > YARN-7939.003.patch, YARN-7939.004.patch > > > Yarn core supports in-place upgrade of containers. A yarn service can > leverage that to provide in-place upgrade of component instances. Please see > YARN-7512 for details. > Will add support to upgrade a single component instance first and then > iteratively add other APIs and features. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431176#comment-16431176 ] Jason Lowe commented on YARN-7667: -- The TestContainerManager failure is unrelated, see YARN-7145. The TestContainerSchedulerQueuing failure is being tracked by YARN-7700. +1 lgtm. I'll wait to make sure Shane is good with the latest patch before committing. > Docker Stop grace period should be configurable > --- > > Key: YARN-7667 > URL: https://issues.apache.org/jira/browse/YARN-7667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7667.001.patch, YARN-7667.002.patch, > YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch, > YARN-7667.006.patch > > > {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never > called. So, the stop uses the 10 second default grace period from docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance
[ https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-7939: Attachment: YARN-7939.004.patch > Yarn Service Upgrade: add support to upgrade a component instance > -- > > Key: YARN-7939 > URL: https://issues.apache.org/jira/browse/YARN-7939 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-7939.001.patch, YARN-7939.002.patch, > YARN-7939.003.patch, YARN-7939.004.patch > > > Yarn core supports in-place upgrade of containers. A yarn service can > leverage that to provide in-place upgrade of component instances. Please see > YARN-7512 for details. > Will add support to upgrade a single component instance first and then > iteratively add other APIs and features. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431153#comment-16431153 ] genericqa commented on YARN-7667: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 12s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 30s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}110m 28s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | | | hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-7667 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918219/YARN-7667.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | |
[jira] [Commented] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance
[ https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431149#comment-16431149 ] Gour Saha commented on YARN-7939: - bq. There is no {{NEEDS_UPGRADE}} state at service level, so the json that you posted in your example for service level is incorrect. [~csingh], shouldn't the API submission itself have thrown validation error since ServiceState NEEDS_UPGRADE does not even exist? > Yarn Service Upgrade: add support to upgrade a component instance > -- > > Key: YARN-7939 > URL: https://issues.apache.org/jira/browse/YARN-7939 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-7939.001.patch, YARN-7939.002.patch, > YARN-7939.003.patch > > > Yarn core supports in-place upgrade of containers. A yarn service can > leverage that to provide in-place upgrade of component instances. Please see > YARN-7512 for details. > Will add support to upgrade a single component instance first and then > iteratively add other APIs and features. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8133) Doc link broken for yarn-service from overview page.
[ https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431130#comment-16431130 ] Gour Saha commented on YARN-8133: - [~rohithsharma], thank you for the patch. Similar problems are there in all these files. Can you fix them as well? src/site/markdown/yarn-service/Concepts.md src/site/markdown/yarn-service/QuickStart.md src/site/markdown/yarn-service/RegistryDNS.md src/site/markdown/yarn-service/ServiceDiscovery.md > Doc link broken for yarn-service from overview page. > > > Key: YARN-8133 > URL: https://issues.apache.org/jira/browse/YARN-8133 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-8133.01.patch > > > I see that documentation link broken from overview page. > Any link clicking from > http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html > page causing an error. > It looks like Overview page, redirecting with .md page which doesn't exist. > It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7930) Add configuration to initialize RM with configured labels.
[ https://issues.apache.org/jira/browse/YARN-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431066#comment-16431066 ] genericqa commented on YARN-7930: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 47s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 267 unchanged - 0 fixed = 268 total (was 267) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 44s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 7s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 85m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.nodelabels.TestCommonNodeLabelsManager | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-7930 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918202/YARN-7930.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2bd04a30ccc2 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e9b9f48 | | maven | version: Apache Maven 3.3.9 |
[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""
[ https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431063#comment-16431063 ] Wangda Tan commented on YARN-8116: -- [~csingh], thanks for working on the fix. It's better to include a simple UT to avoid regression since this is in a critical path of NM recovery. > Nodemanager fails with NumberFormatException: For input string: "" > -- > > Key: YARN-8116 > URL: https://issues.apache.org/jira/browse/YARN-8116 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8116.001.patch > > > Steps followed. > 1) Update nodemanager debug delay config > {code} > > yarn.nodemanager.delete.debug-delay-sec > 350 > {code} > 2) Launch distributed shell application multiple times > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn jar > hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 120" > -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > hadoop-yarn-applications-distributedshell-*.jar{code} > 3) restart NM > Nodemanager fails to start with below error. > {code} > {code:title=NM log} > 2018-03-23 21:32:14,437 INFO monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: > true > 2018-03-23 21:32:14,439 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set > as 3600. The logs will be aggregated every 3600 seconds > 2018-03-23 21:32:14,455 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > failed in state INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960) > 2018-03-23 21:32:14,458 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceStop(148)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService > waiting for pending aggregation during exit > 2018-03-23 21:32:14,460 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state > INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at >
[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""
[ https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431046#comment-16431046 ] genericqa commented on YARN-8116: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 48s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 72m 11s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-8116 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918195/YARN-8116.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5a4b21daf9e5 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e9b9f48 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/20276/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20276/testReport/ | | Max. process+thread count | 410 (vs. ulimit of 1) | | modules | C:
[jira] [Commented] (YARN-8100) Support API interface to query cluster attributes and attribute to nodes
[ https://issues.apache.org/jira/browse/YARN-8100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431039#comment-16431039 ] genericqa commented on YARN-8100: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} YARN-3409 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 45s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 0s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 37s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 13s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 0s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 10s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in YARN-3409 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 6s{color} | {color:green} YARN-3409 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 26m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 25s{color} | {color:green} root: The patch generated 0 new + 471 unchanged - 3 fixed = 471 total (was 474) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 22s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 37s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 39s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 32s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 55s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} |
[jira] [Commented] (YARN-8131) Provice CLI option to DS for publishing entities into sub application
[ https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431028#comment-16431028 ] Haibo Chen commented on YARN-8131: -- I'd argue that we'd like to mark subApplication Entity as LimitedPrivate("Tez") specifically for this reason. The SubApplicationEntity is designed specifically to address Tez's use case where one YARN AM is shared to run multiple user queries. Hence, we should reply on Tez or similar use case to test SubApplicationEntity API. > Provice CLI option to DS for publishing entities into sub application > - > > Key: YARN-8131 > URL: https://issues.apache.org/jira/browse/YARN-8131 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Major > > Post YARN-6936, TimelineV2Client exposes API to publish entities into sub > application table. We should add this CLI option in DS so that API can be > tested. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431008#comment-16431008 ] Eric Badger commented on YARN-7667: --- [~shaneku...@gmail.com], shoot that's pretty embarrassing. I updated trunk but forgot to rebase. Did a quick rebase and now patch 006 should be patch 004 plus the one checkstyle fix. Sorry for the weird patch 005. > Docker Stop grace period should be configurable > --- > > Key: YARN-7667 > URL: https://issues.apache.org/jira/browse/YARN-7667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7667.001.patch, YARN-7667.002.patch, > YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch, > YARN-7667.006.patch > > > {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never > called. So, the stop uses the 10 second default grace period from docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7667: -- Attachment: YARN-7667.006.patch > Docker Stop grace period should be configurable > --- > > Key: YARN-7667 > URL: https://issues.apache.org/jira/browse/YARN-7667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7667.001.patch, YARN-7667.002.patch, > YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch, > YARN-7667.006.patch > > > {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never > called. So, the stop uses the 10 second default grace period from docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430998#comment-16430998 ] Shane Kumpf commented on YARN-7667: --- Thanks for updating the patch, [~ebadger]. I tested the 004 patch, as the 005 patch doesn't look right, and 004 looks good to me less that checkstyle issue. +1 (non-binding) once checkstyle is addressed. > Docker Stop grace period should be configurable > --- > > Key: YARN-7667 > URL: https://issues.apache.org/jira/browse/YARN-7667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7667.001.patch, YARN-7667.002.patch, > YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch > > > {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never > called. So, the stop uses the 10 second default grace period from docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8131) Provice CLI option to DS for publishing entities into sub application
[ https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430992#comment-16430992 ] Rohith Sharma K S commented on YARN-8131: - Primarily DS is used for verifying YARN service features. Though *logically* DS doesn't come under sub application concept, there should be way to publish into sub app so that this feature can be verified. If we don't want to provide CLI option, may be by default DS AM can make user newer API so that data goes into both tables. Otherwise, this API will be untested until Tez or any other framework make use of this API. > Provice CLI option to DS for publishing entities into sub application > - > > Key: YARN-8131 > URL: https://issues.apache.org/jira/browse/YARN-8131 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Major > > Post YARN-6936, TimelineV2Client exposes API to publish entities into sub > application table. We should add this CLI option in DS so that API can be > tested. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8133) Doc link broken for yarn-service from overview page.
[ https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S reassigned YARN-8133: --- Assignee: Rohith Sharma K S > Doc link broken for yarn-service from overview page. > > > Key: YARN-8133 > URL: https://issues.apache.org/jira/browse/YARN-8133 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-8133.01.patch > > > I see that documentation link broken from overview page. > Any link clicking from > http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html > page causing an error. > It looks like Overview page, redirecting with .md page which doesn't exist. > It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8134) Support specifying node resources in SLS
[ https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430980#comment-16430980 ] genericqa commented on YARN-8134: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 12s{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 3 new + 14 unchanged - 0 fixed = 17 total (was 14) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 49s{color} | {color:red} hadoop-tools/hadoop-sls generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 35s{color} | {color:green} hadoop-sls in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 21s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-tools/hadoop-sls | | | org.apache.hadoop.yarn.sls.SLSRunner.startNM() makes inefficient use of keySet iterator instead of entrySet iterator At SLSRunner.java:keySet iterator instead of entrySet iterator At SLSRunner.java:[line 340] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-8134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918196/YARN-8134.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8d557f569f3f 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ac32b35 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20274/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt | | findbugs |
[jira] [Comment Edited] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance
[ https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430951#comment-16430951 ] Chandni Singh edited comment on YARN-7939 at 4/9/18 5:58 PM: - [~eyang] Upgraded is either 2 steps when finalization is done automatically, or 3 steps when finalization is done manually: Step 1: Initiate service level upgrade. This requires posting the newer spec. Here is an example: {code:java} { "name": "test1", "version" : "v2", "state": "UPGRADING", "components" : [ { "name": "sleeper", "number_of_containers": 2, "launch_command": "sleep 120", "resource": { "cpus": 1, "memory": "256" } } ] }{code} This json is the spec json and not the state json. There is no {{NEEDS_UPGRADE}} state at service level, so the json that you posted in your example for service level is incorrect. Step 2: Trigger upgrade of component or individual component instances. An example of this request is {code:java} { "state": "UPGRADING", "component_instance_name": "sleeper-0" }{code} The {{NEEDS_UPGRADE}} state is not something user specifies. All the components and their instances which have changes (this is figured out when the new spec is provided) have there state set as NEEDS_UPGRADE. This tells the user which component or instances have not yet been upgraded. Continuing with the above example, once the service upgrade is initiated, {{sleeper}} comp and both its instance will be in state {{NEEDS_UPGRADE}}. After triggering the upgrade of {{sleeper-0}} it will become {{STABLE}} at some point. However, {{sleeper-1}} will still be in {{NEEDS_UPGRADE}} state, which indicates that this instance still needs to be upgraded. was (Author: csingh): [~eyang] Upgraded is either 2 steps when finalization is done automatically, or 3 steps when finalization is done manually: Step 1: Initiate service level upgrade. This requires posting the newer spec. Here is an example: {code:java} { "name": "test1", "version" : "v2", "state": "UPGRADING", "components" : [ { "name": "sleeper", "number_of_containers": 2, "launch_command": "sleep 120", "resource": { "cpus": 1, "memory": "256" } } ] }{code} This json is the spec json and not the state json. There is no {{NEEDS_UPGRADE}} state at service level, so the json that you posted in your example for service level is incorrect. Step 2: Trigger upgrade of component or individual component instances. An example of this request is {code:java} { "state": "UPGRADING", "component_instance_name": "sleeper-0" }{code} > Yarn Service Upgrade: add support to upgrade a component instance > -- > > Key: YARN-7939 > URL: https://issues.apache.org/jira/browse/YARN-7939 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-7939.001.patch, YARN-7939.002.patch, > YARN-7939.003.patch > > > Yarn core supports in-place upgrade of containers. A yarn service can > leverage that to provide in-place upgrade of component instances. Please see > YARN-7512 for details. > Will add support to upgrade a single component instance first and then > iteratively add other APIs and features. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance
[ https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430951#comment-16430951 ] Chandni Singh commented on YARN-7939: - [~eyang] Upgraded is either 2 steps when finalization is done automatically, or 3 steps when finalization is done manually: Step 1: Initiate service level upgrade. This requires posting the newer spec. Here is an example: {code:java} { "name": "test1", "version" : "v2", "state": "UPGRADING", "components" : [ { "name": "sleeper", "number_of_containers": 2, "launch_command": "sleep 120", "resource": { "cpus": 1, "memory": "256" } } ] }{code} This json is the spec json and not the state json. There is no {{NEEDS_UPGRADE}} state at service level, so the json that you posted in your example for service level is incorrect. Step 2: Trigger upgrade of component or individual component instances. An example of this request is {code:java} { "state": "UPGRADING", "component_instance_name": "sleeper-0" }{code} > Yarn Service Upgrade: add support to upgrade a component instance > -- > > Key: YARN-7939 > URL: https://issues.apache.org/jira/browse/YARN-7939 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-7939.001.patch, YARN-7939.002.patch, > YARN-7939.003.patch > > > Yarn core supports in-place upgrade of containers. A yarn service can > leverage that to provide in-place upgrade of component instances. Please see > YARN-7512 for details. > Will add support to upgrade a single component instance first and then > iteratively add other APIs and features. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7941) Transitive dependencies for component are not resolved
[ https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430939#comment-16430939 ] Billie Rinaldi edited comment on YARN-7941 at 4/9/18 5:35 PM: -- I think I see the problem. The dependency readiness evaluation is checking whether the number of ready containers is less than the number of desired containers. But the number of desired containers is not being set until a flex event is issued for the component, so we are checking that the number of ready containers is not less than 0. I think we can fix this by initializing the number of desired containers in the Component constructor. was (Author: billie.rinaldi): I think I see the problem. The dependency readiness evaluation is checking whether the number of ready containers equals the number of desired containers. But the number of desired containers is not being set until a flex event is issued for the component, so we are checking that the number of ready containers is not less than 0. I think we can fix this by initializing the number of desired containers in the Component constructor. > Transitive dependencies for component are not resolved > --- > > Key: YARN-7941 > URL: https://issues.apache.org/jira/browse/YARN-7941 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Priority: Major > > It is observed that transitive dependencies are not resolved as a result one > of the component is started earlier. > Ex : In HBase app, > master is independent component, > regionserver is depends on master. > hbaseclient depends on regionserver, > but I always see that HBaseClient is launched before regionserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7941) Transitive dependencies for component are not resolved
[ https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430939#comment-16430939 ] Billie Rinaldi commented on YARN-7941: -- I think I see the problem. The dependency readiness evaluation is checking whether the number of ready containers equals the number of desired containers. But the number of desired containers is not being set until a flex event is issued for the component, so we are checking that the number of ready containers is not less than 0. I think we can fix this by initializing the number of desired containers in the Component constructor. > Transitive dependencies for component are not resolved > --- > > Key: YARN-7941 > URL: https://issues.apache.org/jira/browse/YARN-7941 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Priority: Major > > It is observed that transitive dependencies are not resolved as a result one > of the component is started earlier. > Ex : In HBase app, > master is independent component, > regionserver is depends on master. > hbaseclient depends on regionserver, > but I always see that HBaseClient is launched before regionserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8131) Provice CLI option to DS for publishing entities into sub application
[ https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430934#comment-16430934 ] Vrushali C edited comment on YARN-8131 at 4/9/18 5:28 PM: -- I agree that the end user should not be concerned about data going to specific tables. The framework should be handling this, like the Tez AM. In the distributed shell example, we should figure out if there is any data that is equivalent to a sub-app use case. If not, we should write a different one to test querying/writing out subapp data. It should not be an CLI option. The flow name, version and flow run id are inputs as CLI options and that is different from a sub-app query. If we set the wrong example in DS, then it is likely to confuse frameworks about using subapp data. Let's have a good example for sub-application data. One thing is for sure, the data should be written "logically" by the AM to timeline service without caring/knowing where exactly the data ends up in the backend. Meaning, if it is a flow level value, it's stored at the flow level. If it's an application metric, it's at the app level. The AM need not be concerned that there are two tables at the backend, flow & application. All it should care for is, that this particular value belongs to flow level, that particular value makes sense at the app level and some other third value makes sense at the sub-app level. was (Author: vrushalic): I agree that the end user should not be concerned about data going to specific tables. The framework should be handling this, like the Tez AM. In the distributed shell example, we should figure out if there is any data that is equivalent to a sub-app use case. If not, we should write a different one to test querying/writing out subapp data. It should not be an CLI option. The flow name, version and flow run id are to CLI options and that is different from a sub-app query. If we set the wrong example in DS, then it is likely to confuse frameworks about using subapp data. Let's have a good example for sub-application data. One thing is for sure, the data should be written "logically" by the AM to timeline service without caring/knowing where exactly the data ends up in the backend. Meaning, if it is a flow level value, it's stored at the flow level. If it's an application metric, it's at the app level. The AM need not be concerned that there are two tables at the backend, flow & application. All it should care for is, that this particular value belongs to flow level, that particular value makes sense at the app level and some other third value makes sense at the sub-app level. > Provice CLI option to DS for publishing entities into sub application > - > > Key: YARN-8131 > URL: https://issues.apache.org/jira/browse/YARN-8131 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Major > > Post YARN-6936, TimelineV2Client exposes API to publish entities into sub > application table. We should add this CLI option in DS so that API can be > tested. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8131) Provice CLI option to DS for publishing entities into sub application
[ https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430934#comment-16430934 ] Vrushali C commented on YARN-8131: -- I agree that the end user should not be concerned about data going to specific tables. The framework should be handling this, like the Tez AM. In the distributed shell example, we should figure out if there is any data that is equivalent to a sub-app use case. If not, we should write a different one to test querying/writing out subapp data. It should not be an CLI option. The flow name, version and flow run id are to CLI options and that is different from a sub-app query. If we set the wrong example in DS, then it is likely to confuse frameworks about using subapp data. Let's have a good example for sub-application data. One thing is for sure, the data should be written "logically" by the AM to timeline service without caring/knowing where exactly the data ends up in the backend. Meaning, if it is a flow level value, it's stored at the flow level. If it's an application metric, it's at the app level. The AM need not be concerned that there are two tables at the backend, flow & application. All it should care for is, that this particular value belongs to flow level, that particular value makes sense at the app level and some other third value makes sense at the sub-app level. > Provice CLI option to DS for publishing entities into sub application > - > > Key: YARN-8131 > URL: https://issues.apache.org/jira/browse/YARN-8131 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Major > > Post YARN-6936, TimelineV2Client exposes API to publish entities into sub > application table. We should add this CLI option in DS so that API can be > tested. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7930) Add configuration to initialize RM with configured labels.
[ https://issues.apache.org/jira/browse/YARN-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-7930: Attachment: YARN-7930.003.patch > Add configuration to initialize RM with configured labels. > -- > > Key: YARN-7930 > URL: https://issues.apache.org/jira/browse/YARN-7930 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-7930.001.patch, YARN-7930.002.patch, > YARN-7930.003.patch > > > At present, the only way to create labels is using admin API. Sometimes, > there is a requirement to start the cluster with pre-configured node labels. > This Jira introduces yarn configurations to start RM with predefined node > labels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8133) Doc link broken for yarn-service from overview page.
[ https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430886#comment-16430886 ] genericqa commented on YARN-8133: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 36m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 49m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-8133 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918182/YARN-8133.01.patch | | Optional Tests | asflicense mvnsite | | uname | Linux 49d1b143969e 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ac32b35 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 341 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20272/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Doc link broken for yarn-service from overview page. > > > Key: YARN-8133 > URL: https://issues.apache.org/jira/browse/YARN-8133 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-8133.01.patch > > > I see that documentation link broken from overview page. > Any link clicking from > http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html > page causing an error. > It looks like Overview page, redirecting with .md page which doesn't exist. > It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7781) Update YARN-Services-Examples.md to be in sync with the latest code
[ https://issues.apache.org/jira/browse/YARN-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430869#comment-16430869 ] Gour Saha commented on YARN-7781: - [~jianhe] is it ok if I take over this jira and make the final few necessary changes? > Update YARN-Services-Examples.md to be in sync with the latest code > --- > > Key: YARN-7781 > URL: https://issues.apache.org/jira/browse/YARN-7781 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Jian He >Priority: Major > Attachments: YARN-7781.01.patch, YARN-7781.02.patch, > YARN-7781.03.patch > > > Update YARN-Services-Examples.md to make the following additions/changes: > 1. Add an additional URL and PUT Request JSON to support flex: > Update to flex up/down the no of containers (instances) of a component of a > service > PUT URL – http://localhost:8088/app/v1/services/hello-world > PUT Request JSON > {code} > { > "components" : [ { > "name" : "hello", > "number_of_containers" : 3 > } ] > } > {code} > 2. Modify all occurrences of /ws/ to /app/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""
[ https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430862#comment-16430862 ] Chandni Singh commented on YARN-8116: - Empty retry times list was being saved in the NMStore which caused this exception. Have a simple fix for it. [~leftnoteasy] could you please review? > Nodemanager fails with NumberFormatException: For input string: "" > -- > > Key: YARN-8116 > URL: https://issues.apache.org/jira/browse/YARN-8116 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8116.001.patch > > > Steps followed. > 1) Update nodemanager debug delay config > {code} > > yarn.nodemanager.delete.debug-delay-sec > 350 > {code} > 2) Launch distributed shell application multiple times > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn jar > hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 120" > -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > hadoop-yarn-applications-distributedshell-*.jar{code} > 3) restart NM > Nodemanager fails to start with below error. > {code} > {code:title=NM log} > 2018-03-23 21:32:14,437 INFO monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: > true > 2018-03-23 21:32:14,439 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set > as 3600. The logs will be aggregated every 3600 seconds > 2018-03-23 21:32:14,455 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > failed in state INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960) > 2018-03-23 21:32:14,458 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceStop(148)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService > waiting for pending aggregation during exit > 2018-03-23 21:32:14,460 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state > INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at >
[jira] [Updated] (YARN-8133) Doc link broken for yarn-service from overview page.
[ https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8133: - Priority: Blocker (was: Major) > Doc link broken for yarn-service from overview page. > > > Key: YARN-8133 > URL: https://issues.apache.org/jira/browse/YARN-8133 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-8133.01.patch > > > I see that documentation link broken from overview page. > Any link clicking from > http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html > page causing an error. > It looks like Overview page, redirecting with .md page which doesn't exist. > It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8133) Doc link broken for yarn-service from overview page.
[ https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8133: - Target Version/s: 3.1.1 > Doc link broken for yarn-service from overview page. > > > Key: YARN-8133 > URL: https://issues.apache.org/jira/browse/YARN-8133 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-8133.01.patch > > > I see that documentation link broken from overview page. > Any link clicking from > http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html > page causing an error. > It looks like Overview page, redirecting with .md page which doesn't exist. > It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8134) Support specifying node resources in SLS
[ https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-8134: Attachment: YARN-8134.patch > Support specifying node resources in SLS > > > Key: YARN-8134 > URL: https://issues.apache.org/jira/browse/YARN-8134 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-8134.patch > > > At present, all nodes have same resources in SLS. We need to add capability > to add different resources to different nodes in SLS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8134) Support specifying node resources in SLS
Abhishek Modi created YARN-8134: --- Summary: Support specifying node resources in SLS Key: YARN-8134 URL: https://issues.apache.org/jira/browse/YARN-8134 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi At present, all nodes have same resources in SLS. We need to add capability to add different resources to different nodes in SLS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""
[ https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8116: Attachment: YARN-8116.001.patch > Nodemanager fails with NumberFormatException: For input string: "" > -- > > Key: YARN-8116 > URL: https://issues.apache.org/jira/browse/YARN-8116 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8116.001.patch > > > Steps followed. > 1) Update nodemanager debug delay config > {code} > > yarn.nodemanager.delete.debug-delay-sec > 350 > {code} > 2) Launch distributed shell application multiple times > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn jar > hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 120" > -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > hadoop-yarn-applications-distributedshell-*.jar{code} > 3) restart NM > Nodemanager fails to start with below error. > {code} > {code:title=NM log} > 2018-03-23 21:32:14,437 INFO monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: > true > 2018-03-23 21:32:14,439 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set > as 3600. The logs will be aggregated every 3600 seconds > 2018-03-23 21:32:14,455 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > failed in state INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960) > 2018-03-23 21:32:14,458 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceStop(148)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService > waiting for pending aggregation during exit > 2018-03-23 21:32:14,460 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state > INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) > at >
[jira] [Updated] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7667: -- Attachment: YARN-7667.005.patch > Docker Stop grace period should be configurable > --- > > Key: YARN-7667 > URL: https://issues.apache.org/jira/browse/YARN-7667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7667.001.patch, YARN-7667.002.patch, > YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch > > > {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never > called. So, the stop uses the 10 second default grace period from docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430842#comment-16430842 ] Eric Badger commented on YARN-7667: --- Fixing checkstyle > Docker Stop grace period should be configurable > --- > > Key: YARN-7667 > URL: https://issues.apache.org/jira/browse/YARN-7667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7667.001.patch, YARN-7667.002.patch, > YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch > > > {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never > called. So, the stop uses the 10 second default grace period from docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8131) Provice CLI option to DS for publishing entities into sub application
[ https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430839#comment-16430839 ] Rohith Sharma K S commented on YARN-8131: - We need to provide an option in DS client CLI, unlike -flow_name, -flow_version and -flow_run_id. Based on CLI option, DS application master can decide whether to write into sub application or not. This is to test new API because these new API is not integrated anywhere else. > Provice CLI option to DS for publishing entities into sub application > - > > Key: YARN-8131 > URL: https://issues.apache.org/jira/browse/YARN-8131 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Major > > Post YARN-6936, TimelineV2Client exposes API to publish entities into sub > application table. We should add this CLI option in DS so that API can be > tested. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8131) Provice CLI option to DS for publishing entities into sub application
[ https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430813#comment-16430813 ] Haibo Chen commented on YARN-8131: -- [~rohithsharma] I don't think we should let the end user decide whether entities are posted into sub application or not. The framework, DistributeShell in this case, should decide. The end user, or user of CLI cares about data, not how it is stored in ATSv2 by DS. [~vrushalic] What's your take on this? > Provice CLI option to DS for publishing entities into sub application > - > > Key: YARN-8131 > URL: https://issues.apache.org/jira/browse/YARN-8131 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Major > > Post YARN-6936, TimelineV2Client exposes API to publish entities into sub > application table. We should add this CLI option in DS so that API can be > tested. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""
[ https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh reassigned YARN-8116: --- Assignee: Chandni Singh > Nodemanager fails with NumberFormatException: For input string: "" > -- > > Key: YARN-8116 > URL: https://issues.apache.org/jira/browse/YARN-8116 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Chandni Singh >Priority: Critical > > Steps followed. > 1) Update nodemanager debug delay config > {code} > > yarn.nodemanager.delete.debug-delay-sec > 350 > {code} > 2) Launch distributed shell application multiple times > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn jar > hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 120" > -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar > hadoop-yarn-applications-distributedshell-*.jar{code} > 3) restart NM > Nodemanager fails to start with below error. > {code} > {code:title=NM log} > 2018-03-23 21:32:14,437 INFO monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: > true > 2018-03-23 21:32:14,439 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set > as 3600. The logs will be aggregated every 3600 seconds > 2018-03-23 21:32:14,455 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > failed in state INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960) > 2018-03-23 21:32:14,458 INFO logaggregation.LogAggregationService > (LogAggregationService.java:serviceStop(148)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService > waiting for pending aggregation during exit > 2018-03-23 21:32:14,460 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state > INITED > java.lang.NumberFormatException: For input string: "" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:601) > at java.lang.Long.parseLong(Long.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at >
[jira] [Commented] (YARN-7574) Add support for Node Labels on Auto Created Leaf Queue Template
[ https://issues.apache.org/jira/browse/YARN-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430802#comment-16430802 ] Hudson commented on YARN-7574: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13942 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13942/]) YARN-7574. Add support for Node Labels on Auto Created Leaf Queue (sunilg: rev 821b0de4c59156d4a65112de03ba3e7e1c88e309) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ManagedParentQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/PendingAskUpdateResult.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/queuemanagement/GuaranteedOrZeroCapacityOverTimePolicy.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerAutoCreatedQueueBase.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueManagementDynamicEditPolicy.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AutoCreatedLeafQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Allocation.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerAutoQueueCreation.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AutoCreatedQueueManagementPolicy.java > Add support for Node Labels on Auto Created Leaf Queue Template > --- > > Key: YARN-7574 > URL: https://issues.apache.org/jira/browse/YARN-7574 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-7574.1.patch, YARN-7574.10.patch, > YARN-7574.11.patch, YARN-7574.12.patch, YARN-7574.2.patch, YARN-7574.3.patch, > YARN-7574.4.patch, YARN-7574.5.patch, YARN-7574.6.patch, YARN-7574.7.patch, > YARN-7574.8.patch, YARN-7574.9.patch > > > YARN-7473 adds support for auto created leaf queues to inherit node labels > capacities from parent queues. Howebver there is no support for leaf queue > template to allow different configured capacities for different node labels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To
[jira] [Updated] (YARN-8133) Doc link broken for yarn-service from overview page.
[ https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-8133: Attachment: YARN-8133.01.patch > Doc link broken for yarn-service from overview page. > > > Key: YARN-8133 > URL: https://issues.apache.org/jira/browse/YARN-8133 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Rohith Sharma K S >Priority: Major > Attachments: YARN-8133.01.patch > > > I see that documentation link broken from overview page. > Any link clicking from > http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html > page causing an error. > It looks like Overview page, redirecting with .md page which doesn't exist. > It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8133) Doc link broken for yarn-service from overview page.
Rohith Sharma K S created YARN-8133: --- Summary: Doc link broken for yarn-service from overview page. Key: YARN-8133 URL: https://issues.apache.org/jira/browse/YARN-8133 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Rohith Sharma K S I see that documentation link broken from overview page. Any link clicking from http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html page causing an error. It looks like Overview page, redirecting with .md page which doesn't exist. It should redirect to *.html page -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430788#comment-16430788 ] genericqa commented on YARN-7667: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 33s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 237 unchanged - 0 fixed = 238 total (was 237) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 0s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 17s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 31s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}111m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.api.impl.TestTimelineClientV2Impl | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-7667 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918160/YARN-7667.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux
[jira] [Commented] (YARN-7221) Add security check for privileged docker container
[ https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430726#comment-16430726 ] Eric Badger commented on YARN-7221: --- bq. Hi Eric Badger Jason Lowe, do we agree on the last change to check submitting user for sudo privileges instead of yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user? Yep, I agree with that > Add security check for privileged docker container > -- > > Key: YARN-7221 > URL: https://issues.apache.org/jira/browse/YARN-7221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security >Affects Versions: 3.0.0, 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-7221.001.patch, YARN-7221.002.patch, > YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, > YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, > YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, > YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, > YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, > YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch > > > When a docker is running with privileges, majority of the use case is to have > some program running with root then drop privileges to another user. i.e. > httpd to start with privileged and bind to port 80, then drop privileges to > www user. > # We should add security check for submitting users, to verify they have > "sudo" access to run privileged container. > # We should remove --user=uid:gid for privileged containers. > > Docker can be launched with --privileged=true, and --user=uid:gid flag. With > this parameter combinations, user will not have access to become root user. > All docker exec command will be drop to uid:gid user to run instead of > granting privileges. User can gain root privileges if container file system > contains files that give user extra power, but this type of image is > considered as dangerous. Non-privileged user can launch container with > special bits to acquire same level of root power. Hence, we lose control of > which image should be run with --privileges, and who have sudo rights to use > privileged container images. As the result, we should check for sudo access > then decide to parameterize --privileged=true OR --user=uid:gid. This will > avoid leading developer down the wrong path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7221) Add security check for privileged docker container
[ https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430715#comment-16430715 ] Jason Lowe commented on YARN-7221: -- bq. do we agree on the last change to check submitting user for sudo privileges instead of yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user? Yes, that sounds like an appropriate change. > Add security check for privileged docker container > -- > > Key: YARN-7221 > URL: https://issues.apache.org/jira/browse/YARN-7221 > Project: Hadoop YARN > Issue Type: Sub-task > Components: security >Affects Versions: 3.0.0, 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-7221.001.patch, YARN-7221.002.patch, > YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, > YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, > YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, > YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, > YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, > YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch > > > When a docker is running with privileges, majority of the use case is to have > some program running with root then drop privileges to another user. i.e. > httpd to start with privileged and bind to port 80, then drop privileges to > www user. > # We should add security check for submitting users, to verify they have > "sudo" access to run privileged container. > # We should remove --user=uid:gid for privileged containers. > > Docker can be launched with --privileged=true, and --user=uid:gid flag. With > this parameter combinations, user will not have access to become root user. > All docker exec command will be drop to uid:gid user to run instead of > granting privileges. User can gain root privileges if container file system > contains files that give user extra power, but this type of image is > considered as dangerous. Non-privileged user can launch container with > special bits to acquire same level of root power. Hence, we lose control of > which image should be run with --privileges, and who have sudo rights to use > privileged container images. As the result, we should check for sudo access > then decide to parameterize --privileged=true OR --user=uid:gid. This will > avoid leading developer down the wrong path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7996) Allow user supplied Docker client configurations with YARN native services
[ https://issues.apache.org/jira/browse/YARN-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430708#comment-16430708 ] Jason Lowe commented on YARN-7996: -- Thanks for the patch! Looks good to me overall. I agree with Billie's comment about clarifying what's expected in the config field. Nit: The following code should have a debug log enabled check at the front of the conditional {code} if (tokens != null && tokens.length != 0) { for (Token token : tokens) { LOG.debug("Got DT: " + token); } } {code} It's a little odd for validateDockerClientConfiguration to go through the motions to build a full URI but then limit that URI check to a particular filesystem. Is that intentional, or should it be calling Path#getFileSystem instead of assuming it should use fs.getFileSystem()? > Allow user supplied Docker client configurations with YARN native services > -- > > Key: YARN-7996 > URL: https://issues.apache.org/jira/browse/YARN-7996 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Attachments: YARN-7996.001.patch, YARN-7996.002.patch, > YARN-7996.003.patch, YARN-7996.004.patch > > > YARN-5428 added support to distributed shell for supplying a Docker client > configuration at application submission time. The auth tokens within the > client configuration are then used to pull images from private Docker > repositories/registries. Add the same support to the YARN Native Services > framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430677#comment-16430677 ] Billie Rinaldi commented on YARN-7962: -- [~wilfreds], any further comments about patch 3, given that try/finally is the recommended best practice? > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, > YARN-7962.4.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430583#comment-16430583 ] Eric Badger commented on YARN-7667: --- [~shaneku...@gmail.com], thanks for the review! bq. Is there a reason not to set the value for yarn.nodemanager.runtime.linux.docker.stop.grace-period in yarn-default.xml to 10? Not particularly. I was just leaving it to be set by the user while using the default value set in YarnConfiguration. I'm not sure on the convention there, so I went ahead and followed your comment and updated the default in yarn-default.xml to 10. bq. I don't think the new DockerStopCommand constructor is necessary, new DockerStopCommand(containerId).setGracePeriod(dockerStopGracePeriod) would achieve the same. Fair enough. I removed the additional constructor and replaced the invocations with {{DockerStopCommand(containerId).setGracePeriod(dockerStopGracePeriod)}} > Docker Stop grace period should be configurable > --- > > Key: YARN-7667 > URL: https://issues.apache.org/jira/browse/YARN-7667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7667.001.patch, YARN-7667.002.patch, > YARN-7667.003.patch, YARN-7667.004.patch > > > {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never > called. So, the stop uses the 10 second default grace period from docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7667) Docker Stop grace period should be configurable
[ https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-7667: -- Attachment: YARN-7667.004.patch > Docker Stop grace period should be configurable > --- > > Key: YARN-7667 > URL: https://issues.apache.org/jira/browse/YARN-7667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7667.001.patch, YARN-7667.002.patch, > YARN-7667.003.patch, YARN-7667.004.patch > > > {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never > called. So, the stop uses the 10 second default grace period from docker -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8132) Final Status of applications shown as UNDEFINED in ATS app queries
[ https://issues.apache.org/jira/browse/YARN-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charan Hebri updated YARN-8132: --- Description: Final Status is shown as UNDEFINED for applications that are KILLED/FAILED. A sample request/response with INFO field for an application, {noformat} 2018-04-09 13:10:02,126 INFO reader.TimelineReaderWebServices (TimelineReaderWebServices.java:getApp(1693)) - Received URL /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO from user hrt_qa 2018-04-09 13:10:02,156 INFO reader.TimelineReaderWebServices (TimelineReaderWebServices.java:getApp(1716)) - Processed URL /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO (Took 30 ms.){noformat} {noformat} { "metrics": [], "events": [], "createdtime": 1523263360719, "idprefix": 0, "id": "application_1523259757659_0003", "type": "YARN_APPLICATION", "info": { "YARN_APPLICATION_CALLER_CONTEXT": "CLI", "YARN_APPLICATION_DIAGNOSTICS_INFO": "Application application_1523259757659_0003 was killed by user xxx_xx at XXX.XXX.XXX.XXX", "YARN_APPLICATION_FINAL_STATUS": "UNDEFINED", "YARN_APPLICATION_NAME": "Sleep job", "YARN_APPLICATION_USER": "hrt_qa", "YARN_APPLICATION_UNMANAGED_APPLICATION": false, "FROM_ID": "yarn-cluster!hrt_qa!test_flow!1523263360719!application_1523259757659_0003", "UID": "yarn-cluster!application_1523259757659_0003", "YARN_APPLICATION_VIEW_ACLS": " ", "YARN_APPLICATION_SUBMITTED_TIME": 1523263360718, "YARN_AM_CONTAINER_LAUNCH_COMMAND": [ "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir= -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Dhdp.version=3.0.0.0-1163 -Xmx819m -Dhdp.version=3.0.0.0-1163 org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/stdout 2>/stderr " ], "YARN_APPLICATION_QUEUE": "default", "YARN_APPLICATION_TYPE": "MAPREDUCE", "YARN_APPLICATION_PRIORITY": 0, "YARN_APPLICATION_LATEST_APP_ATTEMPT": "appattempt_1523259757659_0003_01", "YARN_APPLICATION_TAGS": [ "timeline_flow_name_tag:test_flow" ], "YARN_APPLICATION_STATE": "KILLED" }, "configs": {}, "isrelatedto": {}, "relatesto": {} }{noformat} This is different to what the Resource Manager reports. For KILLED applications the final status is KILLED and for FAILED applications it is FAILED. This behavior is seen in ATSv2 as well as older versions of ATS. was: Final Status is shown as UNDEFINED for applications that are KILLED/FAILED. A sample request/response with INFO field for an application, {noformat} 2018-04-09 13:10:02,126 INFO reader.TimelineReaderWebServices (TimelineReaderWebServices.java:getApp(1693)) - Received URL /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO from user hrt_qa 2018-04-09 13:10:02,156 INFO reader.TimelineReaderWebServices (TimelineReaderWebServices.java:getApp(1716)) - Processed URL /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO (Took 30 ms.){noformat} {noformat} { "metrics": [], "events": [], "createdtime": 1523263360719, "idprefix": 0, "id": "application_1523259757659_0003", "type": "YARN_APPLICATION", "info": { "YARN_APPLICATION_CALLER_CONTEXT": "CLI", "YARN_APPLICATION_DIAGNOSTICS_INFO": "Application application_1523259757659_0003 was killed by user xxx_xx at XXX.XXX.XXX.XXX", "YARN_APPLICATION_FINAL_STATUS": "UNDEFINED", "YARN_APPLICATION_NAME": "Sleep job", "YARN_APPLICATION_USER": "hrt_qa", "YARN_APPLICATION_UNMANAGED_APPLICATION": false, "FROM_ID": "yarn-cluster!hrt_qa!test_flow!1523263360719!application_1523259757659_0003", "UID": "yarn-cluster!application_1523259757659_0003", "YARN_APPLICATION_VIEW_ACLS": " ", "YARN_APPLICATION_SUBMITTED_TIME": 1523263360718, "YARN_AM_CONTAINER_LAUNCH_COMMAND": [ "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir= -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Dhdp.version=3.0.0.0-1163 -Xmx819m -Dhdp.version=3.0.0.0-1163 org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/stdout 2>/stderr " ], "YARN_APPLICATION_QUEUE": "default", "YARN_APPLICATION_TYPE": "MAPREDUCE", "YARN_APPLICATION_PRIORITY": 0, "YARN_APPLICATION_LATEST_APP_ATTEMPT": "appattempt_1523259757659_0003_01", "YARN_APPLICATION_TAGS": [ "timeline_flow_name_tag:test_flow" ], "YARN_APPLICATION_STATE": "KILLED" }, "configs": {}, "isrelatedto": {}, "relatesto": {} }{noformat} This is different to what the Resource Manager reports. For KILLED applications the final status is KILLED and for FAILED applications it is FAILED. This behavior is seen in ATSv2 as well as older versions of ATS. > Final
[jira] [Created] (YARN-8132) Final Status of applications shown as UNDEFINED in ATS app queries
Charan Hebri created YARN-8132: -- Summary: Final Status of applications shown as UNDEFINED in ATS app queries Key: YARN-8132 URL: https://issues.apache.org/jira/browse/YARN-8132 Project: Hadoop YARN Issue Type: Bug Components: ATSv2, timelineservice Reporter: Charan Hebri Final Status is shown as UNDEFINED for applications that are KILLED/FAILED. A sample request/response with INFO field for an application, {noformat} 2018-04-09 13:10:02,126 INFO reader.TimelineReaderWebServices (TimelineReaderWebServices.java:getApp(1693)) - Received URL /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO from user hrt_qa 2018-04-09 13:10:02,156 INFO reader.TimelineReaderWebServices (TimelineReaderWebServices.java:getApp(1716)) - Processed URL /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO (Took 30 ms.){noformat} {noformat} { "metrics": [], "events": [], "createdtime": 1523263360719, "idprefix": 0, "id": "application_1523259757659_0003", "type": "YARN_APPLICATION", "info": { "YARN_APPLICATION_CALLER_CONTEXT": "CLI", "YARN_APPLICATION_DIAGNOSTICS_INFO": "Application application_1523259757659_0003 was killed by user xxx_xx at XXX.XXX.XXX.XXX", "YARN_APPLICATION_FINAL_STATUS": "UNDEFINED", "YARN_APPLICATION_NAME": "Sleep job", "YARN_APPLICATION_USER": "hrt_qa", "YARN_APPLICATION_UNMANAGED_APPLICATION": false, "FROM_ID": "yarn-cluster!hrt_qa!test_flow!1523263360719!application_1523259757659_0003", "UID": "yarn-cluster!application_1523259757659_0003", "YARN_APPLICATION_VIEW_ACLS": " ", "YARN_APPLICATION_SUBMITTED_TIME": 1523263360718, "YARN_AM_CONTAINER_LAUNCH_COMMAND": [ "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir= -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Dhdp.version=3.0.0.0-1163 -Xmx819m -Dhdp.version=3.0.0.0-1163 org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/stdout 2>/stderr " ], "YARN_APPLICATION_QUEUE": "default", "YARN_APPLICATION_TYPE": "MAPREDUCE", "YARN_APPLICATION_PRIORITY": 0, "YARN_APPLICATION_LATEST_APP_ATTEMPT": "appattempt_1523259757659_0003_01", "YARN_APPLICATION_TAGS": [ "timeline_flow_name_tag:test_flow" ], "YARN_APPLICATION_STATE": "KILLED" }, "configs": {}, "isrelatedto": {}, "relatesto": {} }{noformat} This is different to what the Resource Manager reports. For KILLED applications the final status is KILLED and for FAILED applications it is FAILED. This behavior is seen in ATSv2 as well as older versions of ATS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org