[jira] [Commented] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431753#comment-16431753
 ] 

genericqa commented on YARN-8133:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
33m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8133 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918311/YARN-8133.02.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux c31446c34a67 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 0006346 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 409 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20283/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-8133.01.patch, YARN-8133.02.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8137) Parallelize node addition in SLS

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431712#comment-16431712
 ] 

genericqa commented on YARN-8137:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 54s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
17s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8137 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918307/YARN-8137.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ce62db1c3cdf 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 0006346 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20282/testReport/ |
| Max. process+thread count | 469 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20282/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Parallelize node addition in SLS
> 
>
>

[jira] [Updated] (YARN-7930) Add configuration to initialize RM with configured labels.

2018-04-09 Thread Abhishek Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-7930:

Attachment: YARN-7930.004.patch

> Add configuration to initialize RM with configured labels.
> --
>
> Key: YARN-7930
> URL: https://issues.apache.org/jira/browse/YARN-7930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-7930.001.patch, YARN-7930.002.patch, 
> YARN-7930.003.patch, YARN-7930.004.patch
>
>
> At present, the only way to create labels is using admin API. Sometimes, 
> there is a requirement to start the cluster with pre-configured node labels. 
> This Jira introduces yarn configurations to start RM with predefined node 
> labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8126) [Follow up] Support auto-spawning of admin configured services during bootstrap of rm

2018-04-09 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431705#comment-16431705
 ] 

Rohith Sharma K S commented on YARN-8126:
-

bq. If I understand correctly yarn.service.system-service.dir is a 
cluster-specific config, right?
Yes, How about adding new section Quick start? 

> [Follow up] Support auto-spawning of admin configured services during 
> bootstrap of rm
> -
>
> Key: YARN-8126
> URL: https://issues.apache.org/jira/browse/YARN-8126
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8126.001.patch
>
>
> YARN-8048 adds support auto-spawning of admin configured services during 
> bootstrap of rm. 
> This JIRA is to follow up some of the comments discussed in YARN-8048. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-04-09 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431701#comment-16431701
 ] 

Eric Yang commented on YARN-7530:
-

[~leftnoteasy] I think we would like to keep yarn-service-api inside of 
yarn-application/yarn-service subtree instead of trying to separate the project 
into various part of YARN.  This will ensure that yarn-service is a kind of 
YARN application, and it is completely optional from YARN point of view.  This 
will be easier to develop instead of going 5 different sub-projects to change 
code.

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: yarn-native-services
>
> Attachments: YARN-7530.001.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-09 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-8133:

Attachment: YARN-8133.02.patch

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-8133.01.patch, YARN-8133.02.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8100) Support API interface to query cluster attributes and attribute to nodes

2018-04-09 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431690#comment-16431690
 ] 

Bibin A Chundatt commented on YARN-8100:


Thanks [~Naganarasimha] for review and commit

> Support API interface to query cluster attributes and attribute to nodes
> 
>
> Key: YARN-8100
> URL: https://issues.apache.org/jira/browse/YARN-8100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: YARN-3409
>
> Attachments: YARN-8100-YARN-3409.001.patch, 
> YARN-8100-YARN-3409.002.patch, YARN-8100-YARN-3409.003.patch, 
> YARN-8100-YARN-3409.004.patch, YARN-8100-YARN-3409.005.patch, 
> YARN-8100-YARN-3409.006.patch, YARN-8100-YARN-3409.007.patch
>
>
> Jira is to add api to queue cluster node attributes and Attributes to node 
> query 
> *YarnClient*
> {code}
> getAttributesToNodes()
> getAttributesToNodes(Set attribute)
> getClusterAttributes()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431680#comment-16431680
 ] 

Arun Suresh commented on YARN-8135:
---

Interesting! - would like to help out.. awaiting design doc..
Think this should be renamed to YARN-Submarine though.

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8137) Parallelize node addition in SLS

2018-04-09 Thread Abhishek Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8137:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-5065

> Parallelize node addition in SLS
> 
>
> Key: YARN-8137
> URL: https://issues.apache.org/jira/browse/YARN-8137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8137.001.patch
>
>
> Right now, nodes are added sequentially and it can take a long time if there 
> are large number of nodes. With this change nodes will be added in parallel 
> and thus reduce the node addition time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8137) Parallelize node addition in SLS

2018-04-09 Thread Abhishek Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8137:

Attachment: YARN-8137.001.patch

> Parallelize node addition in SLS
> 
>
> Key: YARN-8137
> URL: https://issues.apache.org/jira/browse/YARN-8137
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8137.001.patch
>
>
> Right now, nodes are added sequentially and it can take a long time if there 
> are large number of nodes. With this change nodes will be added in parallel 
> and thus reduce the node addition time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8137) Parallelize node addition in SLS

2018-04-09 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-8137:
---

 Summary: Parallelize node addition in SLS
 Key: YARN-8137
 URL: https://issues.apache.org/jira/browse/YARN-8137
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Right now, nodes are added sequentially and it can take a long time if there 
are large number of nodes. With this change nodes will be added in parallel and 
thus reduce the node addition time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before

2018-04-09 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431661#comment-16431661
 ] 

Tao Yang commented on YARN-6629:


Hi, [~hunanmei...@gmail.com]. Yes, it's the same question.

Attached new patch for branch-2 which also can be cleanly applied to branch-2.9 
and branch-2.9.0. The new patch is nearly the same with trunk. [~leftnoteasy],  
please help to review and commit. Thanks!

> NPE occurred when container allocation proposal is applied but its resource 
> requests are removed before
> ---
>
> Key: YARN-6629
> URL: https://issues.apache.org/jira/browse/YARN-6629
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.1.0
>
> Attachments: YARN-6629.001.patch, YARN-6629.002.patch, 
> YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, 
> YARN-6629.006.patch, YARN-6629.branch-2.001.patch
>
>
> I wrote a test case to reproduce another problem for branch-2 and found new 
> NPE error,  log: 
> {code}
> FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516)
> at 
> org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225)
> at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
> at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
> at 
> org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply()
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Reproduce this error in chronological order:
> 1. AM started and requested 1 container with schedulerRequestKey#1 : 
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests 
> Added schedulerRequestKey#1 into schedulerKeyToPlacementSets
> 2. Scheduler allocatd 1 container for this request and accepted the proposal
> 3. AM removed this request
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests --> 
> AppSchedulingInfo#addToPlacementSets --> 
> AppSchedulingInfo#updatePendingResources
> Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets)
> 4. Scheduler applied this proposal
> CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> 
> AppSchedulingInfo#allocate 
> Throw NPE when called 
> schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, 
> type, node);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Updated] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before

2018-04-09 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6629:
---
Attachment: YARN-6629.branch-2.001.patch

> NPE occurred when container allocation proposal is applied but its resource 
> requests are removed before
> ---
>
> Key: YARN-6629
> URL: https://issues.apache.org/jira/browse/YARN-6629
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.1.0
>
> Attachments: YARN-6629.001.patch, YARN-6629.002.patch, 
> YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, 
> YARN-6629.006.patch, YARN-6629.branch-2.001.patch
>
>
> I wrote a test case to reproduce another problem for branch-2 and found new 
> NPE error,  log: 
> {code}
> FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516)
> at 
> org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225)
> at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
> at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
> at 
> org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply()
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Reproduce this error in chronological order:
> 1. AM started and requested 1 container with schedulerRequestKey#1 : 
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests 
> Added schedulerRequestKey#1 into schedulerKeyToPlacementSets
> 2. Scheduler allocatd 1 container for this request and accepted the proposal
> 3. AM removed this request
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests --> 
> AppSchedulingInfo#addToPlacementSets --> 
> AppSchedulingInfo#updatePendingResources
> Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets)
> 4. Scheduler applied this proposal
> CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> 
> AppSchedulingInfo#allocate 
> Throw NPE when called 
> schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, 
> type, node);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8126) [Follow up] Support auto-spawning of admin configured services during bootstrap of rm

2018-04-09 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431607#comment-16431607
 ] 

Gour Saha commented on YARN-8126:
-

[~rohithsharma] the patch looks good. Few minor comments -

h5. SystemServiceManagerImpl.java
getbadDirSkipCounter make b in bad uppercase

h5. Configurations.md
All service AM specific configs go here. If I understand correctly 
{{yarn.service.system-service.dir}} is a cluster-specific config, right?

Also, thanks for deleting TestSystemServiceManager.java which had all upgrade 
specific tests. I think I missed this in my first round review :)

> [Follow up] Support auto-spawning of admin configured services during 
> bootstrap of rm
> -
>
> Key: YARN-8126
> URL: https://issues.apache.org/jira/browse/YARN-8126
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8126.001.patch
>
>
> YARN-8048 adds support auto-spawning of admin configured services during 
> bootstrap of rm. 
> This JIRA is to follow up some of the comments discussed in YARN-8048. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7142) Support placement policy in yarn native services

2018-04-09 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429610#comment-16429610
 ] 

Weiwei Yang edited comment on YARN-7142 at 4/10/18 2:08 AM:


Hi [~gsaha]/[~leftnoteasy]

Thanks for backing port this to branch-3.1. Not related to this task, I have a 
question about the format of placement policy in yaml file. It looks like it is 
more like an interpretation of how we specify placement constraints using Java 
API. I think we should be able to support a simple PC language, by specifying 
something like:
{code:java}
notin,node,foo
{code}
see more in [this 
doc|https://issues.apache.org/jira/secure/attachment/12911872/Placement%20Constraint%20Expression%20Syntax%20Specification.pdf]
 in YARN-7921. I know this is only used distributed shell as a demo, but I 
think if we find this more easier to write, maybe we can use such expression 
here too? Just want to know your opinion.

Thanks


was (Author: cheersyang):
Hi [~gsaha]/[~leftnoteasy]

Thanks for backing port this to branch-3.1. Not related to this task, I have a 
question about the format of placement policy in yaml file. It looks like it is 
more like an interpretation of how we specify placement constraints using Java 
API. I think we should be able to support a simple PC language, by specifying 
something like:
{code:java}
notin,node,foo
{code}
see more in [^Placement Constraint Expression Syntax Specification.pdf] in 
YARN-7921. I know this is only used distributed shell as a demo, but I think if 
we find this more easier to write, maybe we can use such expression here too? 
Just want to know your opinion.

Thanks

> Support placement policy in yarn native services
> 
>
> Key: YARN-7142
> URL: https://issues.apache.org/jira/browse/YARN-7142
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7142-branch-3.1.004.patch, YARN-7142.001.patch, 
> YARN-7142.002.patch, YARN-7142.003.patch, YARN-7142.004.patch
>
>
> Placement policy exists in the API but is not implemented yet.
> I have filed YARN-8074 to move the composite constraints implementation out 
> of this phase-1 implementation of placement policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7088) Fix application start time and add submit time to UIs

2018-04-09 Thread Kanwaljeet Sachdev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kanwaljeet Sachdev updated YARN-7088:
-
Attachment: YARN-7088.014.patch

> Fix application start time and add submit time to UIs
> -
>
> Key: YARN-7088
> URL: https://issues.apache.org/jira/browse/YARN-7088
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Abdullah Yousufi
>Assignee: Kanwaljeet Sachdev
>Priority: Major
> Attachments: YARN-7088.001.patch, YARN-7088.002.patch, 
> YARN-7088.003.patch, YARN-7088.004.patch, YARN-7088.005.patch, 
> YARN-7088.006.patch, YARN-7088.007.patch, YARN-7088.008.patch, 
> YARN-7088.009.patch, YARN-7088.010.patch, YARN-7088.011.patch, 
> YARN-7088.012.patch, YARN-7088.013.patch, YARN-7088.014.patch
>
>
> Currently, the start time in the old and new UI actually shows the app 
> submission time. There should actually be two different fields; one for the 
> app's submission and one for its start, as well as the elapsed pending time 
> between the two.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Keqiu Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431580#comment-16431580
 ] 

Keqiu Hu commented on YARN-8135:


[~leftnoteasy],
{quote}Since tensorflow supports to read HDFS, ideally all platform can support 
this :). What I meant here is, TF read HDFS needs lots of configurations, and 
needs some specific optimization / considerations to make HDFS access from 
Docker container easier. Our on-going prototype covers some of this problem. 
{quote}
I don't think it would be hard to make HDSF access from Docker container hard 
tho. But it worths mentioning data locality, which is not possible with 
Kuberflow solution :).

 Looking forward to the design doc, will comment more later.

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8103) Add CLI interface to query node attributes

2018-04-09 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431562#comment-16431562
 ] 

Naganarasimha G R commented on YARN-8103:
-

[~bibinchundatt], you were right as discussed offline  i wanted to put the 
above comment in YARN-8104, and split seems fine. Also as discussed in the 
meeting :
 # Cluster CLI will be listing the cluster attributes
 # Attributes CLI provides a api to get the mapping of attribute(s) to nodes 
and the value configured
 # Node CLI should provide the attributes configured for a node and the values 
mapped for each of the attribute.

Only point of discussion here is Should attribute CLI also have the listing of 
all cluster Attributes? though duplicate it will be helpful for a user.

 

> Add CLI interface to  query node attributes
> ---
>
> Key: YARN-8103
> URL: https://issues.apache.org/jira/browse/YARN-8103
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
>
> YARN-8100 will add API interface for querying the attributes. CLI interface 
> for querying node attributes for each nodes and list all attributes in 
> cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8103) Add CLI interface to query node attributes

2018-04-09 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-8103:

Description: YARN-8100 will add API interface for querying the attributes. 
CLI interface for querying node attributes for each nodes and list all 
attributes in cluster.  (was:  YARN-8100 will adds  API interface for querying 
the attributes. CLI interface for querying node attributes for each nodes and 
list all attributes in cluster.)

> Add CLI interface to  query node attributes
> ---
>
> Key: YARN-8103
> URL: https://issues.apache.org/jira/browse/YARN-8103
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
>
> YARN-8100 will add API interface for querying the attributes. CLI interface 
> for querying node attributes for each nodes and list all attributes in 
> cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8104) Add API to fetch node to attribute mapping

2018-04-09 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431559#comment-16431559
 ] 

Naganarasimha G R commented on YARN-8104:
-

Hi [~bibinchundatt], Now that YARN-8100 is in can you rebase on this ?

Also there was one point i thought of adding in 8100 , 
GetAttributesToNodesResponseProto could have had field attributeToNodes as 
attributesToNodes.

Can you incorporate that here ?

> Add API to fetch node to attribute mapping
> --
>
> Key: YARN-8104
> URL: https://issues.apache.org/jira/browse/YARN-8104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8104-YARN-3409.001.patch
>
>
> Add node/host to attribute mapping in yarn client API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8118) Better utilize gracefully decommissioning node managers

2018-04-09 Thread Karthik Palaniappan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431557#comment-16431557
 ] 

Karthik Palaniappan commented on YARN-8118:
---

Sure – I think I get the use case you guys are describing – I'm just trying to 
understand why that's different than option #2 (wait for running containers to 
finish, then decommission the node immediately after).

Is the idea that those 20 minute containers would drain shuffle from 
decommissioning nodes faster than the 10 minute timeout? So then Jason's 
comment about gracefully decommissioning on a "sufficiently large cluster" 
makes sense. So as an admin you just need to set this timeout to enough time to 
finish in-progress containers, finish the current stage (e.g. the map stage), 
and at least start all tasks in the next stage (e.g. the reduce stage) to drain 
shuffle. But you don't necessarily need to wait for the entire application to 
finish.

I still think option #2 and option #3 are both valid secondary use cases, so 
I'm inclined to make an enum parameter for "graceful decommission strategy". In 
terms of plumbing the flag through, using XML config is by far the easiest. But 
I can see an argument that this should be a parameter on a per-decommission-rpc 
basis. Thoughts?

> Better utilize gracefully decommissioning node managers
> ---
>
> Key: YARN-8118
> URL: https://issues.apache.org/jira/browse/YARN-8118
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.8.2
> Environment: * Google Compute Engine (Dataproc)
>  * Java 8
>  * Hadoop 2.8.2 using client-mode graceful decommissioning
>Reporter: Karthik Palaniappan
>Priority: Major
> Attachments: YARN-8118-branch-2.001.patch
>
>
> Proposal design doc with background + details (please comment directly on 
> doc): 
> [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7]
> tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications 
> to complete before shutting down, but they cannot run new containers from 
> those in-progress applications. This is wasteful, particularly in 
> environments where you are billed by resource usage (e.g. EC2).
> Proposal: YARN should schedule containers from in-progress applications on 
> DECOMMISSIONING nodes, but should still avoid scheduling containers from new 
> applications. That will make in-progress applications complete faster and let 
> nodes decommission faster. Overall, this should be cheaper.
> I have a working patch without unit tests that's surprisingly just a few real 
> lines of code (patch 001). If folks are happy with the proposal, I'll write 
> unit tests and also write a patch targeted at trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8079) Support specify files to be downloaded (localized) before containers launched by YARN

2018-04-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8079:
-
Summary: Support specify files to be downloaded (localized) before 
containers launched by YARN  (was: YARN native service should respect source 
file of ConfigFile inside Service/Component spec)

> Support specify files to be downloaded (localized) before containers launched 
> by YARN
> -
>
> Key: YARN-8079
> URL: https://issues.apache.org/jira/browse/YARN-8079
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8079.001.patch, YARN-8079.002.patch, 
> YARN-8079.003.patch, YARN-8079.004.patch, YARN-8079.005.patch
>
>
> Currently, {{srcFile}} is not respected. {{ProviderUtils}} doesn't properly 
> read srcFile, instead it always construct {{remoteFile}} by using 
> componentDir and fileName of {{destFile}}:
> {code}
> Path remoteFile = new Path(compInstanceDir, fileName);
> {code} 
> To me it is a common use case which services have some files existed in HDFS 
> and need to be localized when components get launched. (For example, if we 
> want to serve a Tensorflow model, we need to localize Tensorflow model 
> (typically not huge, less than GB) to local disk. Otherwise launched docker 
> container has to access HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-04-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431548#comment-16431548
 ] 

Wangda Tan commented on YARN-7530:
--

A quick proposal for this:

 
- ApiServerClient/ServiceClient -> yarn-client
- ApiServer/WebApp -> yarn-server/native-service
- hadoop-yarn-services-core/api -> yarn-api/common

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: yarn-native-services
>
> Attachments: YARN-7530.001.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8037) CGroupsResourceCalculator logs excessive warnings on container relaunch

2018-04-09 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431526#comment-16431526
 ] 

Miklos Szegedi commented on YARN-8037:
--

Thank you, [~shaneku...@gmail.com]. How about hashing the stack trace of the 
exception and reporting it only, if it has not been seen before?

> CGroupsResourceCalculator logs excessive warnings on container relaunch
> ---
>
> Key: YARN-8037
> URL: https://issues.apache.org/jira/browse/YARN-8037
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Major
>
> When a container is relaunched, the old process no longer exists. When using 
> the {{CGroupsResourceCalculator}} this results in the warning and exception 
> below being logged every second until the relaunch occurs, which is excessive 
> and filling up the logs.
> {code:java}
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse 12844
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_02/cpuacct.stat
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse cgroups 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.memsw.usage_in_bytes
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.usage_in_bytes
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more{code}
> We should consider moving the exception to debug to reduce the noise at a 
> minimum. Alternatively, it may make sense to stop the existing 
> {{MonitoringThread}} during relaunch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8118) Better utilize gracefully decommissioning node managers

2018-04-09 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431524#comment-16431524
 ] 

Robert Kanter commented on YARN-8118:
-

Thanks for your ideas [~Karthik Palaniappan].

Consider this scenario: You want to gracefully decommission a node with a 
timeout of 10 minutes.  Suppose you have a job that has containers which 
normally take 20 minutes to run.  At this point, we wouldn't want to start any 
of those containers on that node because they're not going to finish before the 
decom timeout ends, so they'd just get killed halfway through; instead of 
running on another node, which would be faster overall.

I'm fine with adding an option for the behavior you're describing, but I don't 
think we can change the default behavior here (it's also not a "bugfix" like 
your design doc suggests; as [~jlowe], [~djp], and my above scenario show, 
there are valid use cases for the current behavior).  

> Better utilize gracefully decommissioning node managers
> ---
>
> Key: YARN-8118
> URL: https://issues.apache.org/jira/browse/YARN-8118
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.8.2
> Environment: * Google Compute Engine (Dataproc)
>  * Java 8
>  * Hadoop 2.8.2 using client-mode graceful decommissioning
>Reporter: Karthik Palaniappan
>Priority: Major
> Attachments: YARN-8118-branch-2.001.patch
>
>
> Proposal design doc with background + details (please comment directly on 
> doc): 
> [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7]
> tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications 
> to complete before shutting down, but they cannot run new containers from 
> those in-progress applications. This is wasteful, particularly in 
> environments where you are billed by resource usage (e.g. EC2).
> Proposal: YARN should schedule containers from in-progress applications on 
> DECOMMISSIONING nodes, but should still avoid scheduling containers from new 
> applications. That will make in-progress applications complete faster and let 
> nodes decommission faster. Overall, this should be cheaper.
> I have a working patch without unit tests that's surprisingly just a few real 
> lines of code (patch 001). If folks are happy with the proposal, I'll write 
> unit tests and also write a patch targeted at trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431519#comment-16431519
 ] 

genericqa commented on YARN-8116:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
34s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8116 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918280/YARN-8116.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f07f3df7260c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 907919d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20280/testReport/ |
| Max. process+thread count | 341 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20280/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Nodemanager fails with NumberFormatException: For input 

[jira] [Commented] (YARN-8100) Support API interface to query cluster attributes and attribute to nodes

2018-04-09 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431485#comment-16431485
 ] 

Naganarasimha G R commented on YARN-8100:
-

Thanks [~bibinchundatt], 

Latest patch looks good to me will commit it shortly.

> Support API interface to query cluster attributes and attribute to nodes
> 
>
> Key: YARN-8100
> URL: https://issues.apache.org/jira/browse/YARN-8100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8100-YARN-3409.001.patch, 
> YARN-8100-YARN-3409.002.patch, YARN-8100-YARN-3409.003.patch, 
> YARN-8100-YARN-3409.004.patch, YARN-8100-YARN-3409.005.patch, 
> YARN-8100-YARN-3409.006.patch, YARN-8100-YARN-3409.007.patch
>
>
> Jira is to add api to queue cluster node attributes and Attributes to node 
> query 
> *YarnClient*
> {code}
> getAttributesToNodes()
> getAttributesToNodes(Set attribute)
> getClusterAttributes()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8118) Better utilize gracefully decommissioning node managers

2018-04-09 Thread Karthik Palaniappan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431472#comment-16431472
 ] 

Karthik Palaniappan commented on YARN-8118:
---

Not sure I understand your use cases (@Jason/@Junping). For jobs that produce 
shuffle data (i.e. all Hadoop-ecosystem jobs?), killing a container is just as 
bad as removing the shuffle it produced. I can imagine a few reasonable 
scenarios around removing nodes:

1) immediately remove nodes (regular decommissioning)

2) wait for containers to finish, but don't wait until applications finish 
(scenarios where shuffle doesn't matter)

3) wait for apps to finish and let in-progress apps use decommissioning nodes

#1 is regular (forceful) decommissioning. #3 is my proposal  – focused at cloud 
environments with potentially drastic scaling events. #2 makes sense for 
non-cloud environments where few nodes are being removed at a time. It also 
makes sense when running jobs that don't produce shuffle output.

So if you're willing to tolerate a behavioral change, maybe #2 should be the 
default, and #3 should be an additional flag (either an XML property or a flag 
on the graceful decommission request).

However, as currently implemented, it seems like graceful decommissioning is 
the worst of all worlds – wait for apps to finish, but don't let apps use 
decommissioning nodes. Am I missing something obvious here? I couldn't find 
anything in the original design docs discussing why it was implemented that way.

> Better utilize gracefully decommissioning node managers
> ---
>
> Key: YARN-8118
> URL: https://issues.apache.org/jira/browse/YARN-8118
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.8.2
> Environment: * Google Compute Engine (Dataproc)
>  * Java 8
>  * Hadoop 2.8.2 using client-mode graceful decommissioning
>Reporter: Karthik Palaniappan
>Priority: Major
> Attachments: YARN-8118-branch-2.001.patch
>
>
> Proposal design doc with background + details (please comment directly on 
> doc): 
> [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7]
> tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications 
> to complete before shutting down, but they cannot run new containers from 
> those in-progress applications. This is wasteful, particularly in 
> environments where you are billed by resource usage (e.g. EC2).
> Proposal: YARN should schedule containers from in-progress applications on 
> DECOMMISSIONING nodes, but should still avoid scheduling containers from new 
> applications. That will make in-progress applications complete faster and let 
> nodes decommission faster. Overall, this should be cheaper.
> I have a working patch without unit tests that's surprisingly just a few real 
> lines of code (patch 001). If folks are happy with the proposal, I'll write 
> unit tests and also write a patch targeted at trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-09 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431444#comment-16431444
 ] 

Chandni Singh commented on YARN-8116:
-

Patch 2 includes also a check in {{NMLeveldbStateStoreService}} for existing 
empty list that are in the db store.
Also added a test for it.

> Nodemanager fails with NumberFormatException: For input string: ""
> --
>
> Key: YARN-8116
> URL: https://issues.apache.org/jira/browse/YARN-8116
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8116.001.patch, YARN-8116.002.patch
>
>
> Steps followed.
> 1) Update nodemanager debug delay config
> {code}
> 
>   yarn.nodemanager.delete.debug-delay-sec
>   350
> {code}
> 2) Launch distributed shell application multiple times
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn  jar 
> hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 120" 
> -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> hadoop-yarn-applications-distributedshell-*.jar{code}
> 3) restart NM
> Nodemanager fails to start with below error.
> {code}
> {code:title=NM log}
> 2018-03-23 21:32:14,437 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: 
> true
> 2018-03-23 21:32:14,439 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set 
> as 3600. The logs will be aggregated every 3600 seconds
> 2018-03-23 21:32:14,455 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
> 2018-03-23 21:32:14,458 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(148)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2018-03-23 21:32:14,460 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state 
> INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> 

[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431442#comment-16431442
 ] 

Hudson commented on YARN-7667:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13947 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13947/])
YARN-7667. Docker Stop grace period should be configurable. Contributed (jlowe: 
rev 907919d28c1b7e4496d189b46ecbb86a10d41339)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Docker Stop grace period should be configurable
> ---
>
> Key: YARN-7667
> URL: https://issues.apache.org/jira/browse/YARN-7667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7667.001.patch, YARN-7667.002.patch, 
> YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch, 
> YARN-7667.006.patch
>
>
> {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never 
> called. So, the stop uses the 10 second default grace period from docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-09 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8116:

Attachment: YARN-8116.002.patch

> Nodemanager fails with NumberFormatException: For input string: ""
> --
>
> Key: YARN-8116
> URL: https://issues.apache.org/jira/browse/YARN-8116
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8116.001.patch, YARN-8116.002.patch
>
>
> Steps followed.
> 1) Update nodemanager debug delay config
> {code}
> 
>   yarn.nodemanager.delete.debug-delay-sec
>   350
> {code}
> 2) Launch distributed shell application multiple times
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn  jar 
> hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 120" 
> -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> hadoop-yarn-applications-distributedshell-*.jar{code}
> 3) restart NM
> Nodemanager fails to start with below error.
> {code}
> {code:title=NM log}
> 2018-03-23 21:32:14,437 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: 
> true
> 2018-03-23 21:32:14,439 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set 
> as 3600. The logs will be aggregated every 3600 seconds
> 2018-03-23 21:32:14,455 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
> 2018-03-23 21:32:14,458 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(148)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2018-03-23 21:32:14,460 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state 
> INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> 

[jira] [Commented] (YARN-7941) Transitive dependencies for component are not resolved

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431439#comment-16431439
 ] 

genericqa commented on YARN-7941:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  3s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  6m 
45s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7941 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918263/YARN-7941.1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0e277bde1947 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9059376 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20279/testReport/ |
| Max. process+thread count | 669 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20279/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Transitive dependencies 

[jira] [Commented] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431423#comment-16431423
 ] 

genericqa commented on YARN-7939:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
52s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  7m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
31s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 60 new + 401 unchanged - 2 fixed = 461 total (was 403) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 28m  
1s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m  
3s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
37s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}118m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7939 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918251/YARN-7939.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 23c06273ed03 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build 

[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431404#comment-16431404
 ] 

genericqa commented on YARN-7667:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 27 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui hadoop-mapreduce-project . 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 13m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
2s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 37m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 25m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 25m 
34s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 55s{color} | {color:orange} root: The patch generated 2 new + 1249 unchanged 
- 1 fixed = 1251 total (was 1250) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 4s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
13s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 22s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui hadoop-mapreduce-project . 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 14m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | 

[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431369#comment-16431369
 ] 

Wangda Tan commented on YARN-8135:
--

[~oliverhuh...@gmail.com], 

Thanks for the responses, 
{quote}what does w/o modification mean ?
{quote}
Without modification of vanilla TF program in order to run on the framework.
{quote}As far as Kubeflow is deployed in the same cluster as Hadoop, Kubeflow 
should be able to access HDFS, through libhdfs or webhdfs interface?
{quote}
Since tensorflow supports to read HDFS, ideally all platform can support this 
:). What I meant here is, TF read HDFS needs lots of configurations, and needs 
some specific optimization / considerations to make HDFS access from Docker 
container easier. Our on-going prototype covers some of this problem. 
{quote}ToS kind of supports GPU scheduling (not isolation) base on memory: if 
you ask for 1 GPU and a machine has 4 GPU, it asks for total memory * the 
portion of GPU you asked.
{quote}
This is not easy for user and cannot guarantee proper isolation, so I didn't 
put a (√) for ToS.

 

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5268) DShell AM fails java.lang.InterruptedException

2018-04-09 Thread Zian Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen resolved YARN-5268.
-
  Resolution: Cannot Reproduce
Release Note: Try to produce the issue on a cluster with the latest code, 
DShell application completed successfully without any failure with the command 
provided in the description. Close it as "can not reproduce".

> DShell AM fails java.lang.InterruptedException
> --
>
> Key: YARN-5268
> URL: https://issues.apache.org/jira/browse/YARN-5268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Zian Chen
>Priority: Critical
>  Labels: oct16-easy
> Attachments: YARN-5268.1.patch
>
>
> Distributed Shell AM failed with the following error
> {Code}
> 16/06/16 11:08:10 INFO impl.NMClientAsyncImpl: NMClient stopped.
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Application 
> completed. Signalling finish to RM
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Diagnostics., 
> total=16, completed=19, allocated=21, failed=4
> 16/06/16 11:08:10 INFO impl.AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 16/06/16 11:08:10 INFO distributedshell.ApplicationMaster: Application Master 
> failed. exiting
> 16/06/16 11:08:10 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting 
> for queue
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)
> End of LogType:AppMaster.stderr
> {Code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8136) Add version attribute to site doc examples and quickstart

2018-04-09 Thread Gour Saha (JIRA)
Gour Saha created YARN-8136:
---

 Summary: Add version attribute to site doc examples and quickstart
 Key: YARN-8136
 URL: https://issues.apache.org/jira/browse/YARN-8136
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: site
Reporter: Gour Saha


version attribute is missing in the following 2 site doc files -

src/site/markdown/yarn-service/Examples.md
src/site/markdown/yarn-service/QuickStart.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Keqiu Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431345#comment-16431345
 ] 

Keqiu Hu commented on YARN-8135:


1. what does w/o modification mean ?

2. As far as Kubeflow is deployed in the same cluster as Hadoop, Kubeflow 
should be able to access HDFS, through libhdfs or webhdfs interface?

3. ToS kind of supports GPU scheduling (not isolation) base on memory: if you 
ask for 1 GPU and a machine has 4 GPU, it asks for total memory * the portion 
of GPU you asked.

 

Love the name and the curly braces {:) }

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Description: 
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can let human to explore deep places. 
B-)

Compare to other projects:

!image-2018-04-09-14-44-41-101.png!

*Notes:*

*GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

**XLearning needs few modification to read ClusterSpec from env.

*References:*
 - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
 - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN]
 - Spark Deep Learning (Databricks): 
[https://github.com/databricks/spark-deep-learning]
 - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
 - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]

  was:
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can take human to deep places. B-)

Compare to other projects:

!image-2018-04-09-14-44-41-101.png!

*Notes:*

*GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

**XLearning needs few modification to read ClusterSpec from env.

*References:*
 - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
 - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN]
 - Spark Deep Learning (Databricks): 
[https://github.com/databricks/spark-deep-learning]
 - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
 - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]


> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Description: 
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can take human to deep places. B-)

Compare to other projects:

!image-2018-04-09-14-44-41-101.png!

*Notes:*

*GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

**XLearning needs few modification to read ClusterSpec from env.

*References:*
 - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
 - TensorFlowOnYARN (Intel): [https://github.com/Intel-bigdata/TensorFlowOnYARN]
 - Spark Deep Learning (Databricks): 
[https://github.com/databricks/spark-deep-learning]
 - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
 - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]

  was:
Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can take human to deep places. B-)

Compare to other projects:

!image-2018-04-09-14-35-16-778.png!

*Notes:*

* GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

** XLearning needs few modification to read ClusterSpec from env.

*References:*

- TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark
- TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN
- Spark Deep Learning (Databricks): 
https://github.com/databricks/spark-deep-learning
- XLearning (Qihoo360): https://github.com/Qihoo360/XLearning
- Kubeflow (Google): https://github.com/kubeflow/kubeflow


> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can take human to deep places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8135:
-
Attachment: image-2018-04-09-14-44-41-101.png

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can take human to deep places. B-)
> Compare to other projects:
> !image-2018-04-09-14-35-16-778.png!
> *Notes:*
> * GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> ** XLearning needs few modification to read ClusterSpec from env.
> *References:*
> - TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark
> - TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN
> - Spark Deep Learning (Databricks): 
> https://github.com/databricks/spark-deep-learning
> - XLearning (Qihoo360): https://github.com/Qihoo360/XLearning
> - Kubeflow (Google): https://github.com/kubeflow/kubeflow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7512) Support service upgrade via YARN Service API and CLI

2018-04-09 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-7512:

Target Version/s: 3.1.1

> Support service upgrade via YARN Service API and CLI
> 
>
> Key: YARN-7512
> URL: https://issues.apache.org/jira/browse/YARN-7512
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Gour Saha
>Assignee: Chandni Singh
>Priority: Major
> Fix For: yarn-native-services
>
> Attachments: _In-Place Upgrade of Long-Running Applications in 
> YARN_v1.pdf, _In-Place Upgrade of Long-Running Applications in YARN_v2.pdf, 
> _In-Place Upgrade of Long-Running Applications in YARN_v3.pdf
>
>
> YARN Service API and CLI needs to support service (and containers) upgrade in 
> line with what Slider supported in SLIDER-787 
> (http://slider.incubator.apache.org/docs/slider_specs/application_pkg_upgrade.html)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component

2018-04-09 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8081:

Target Version/s: 3.1.1

> Yarn Service Upgrade: Add support to upgrade a component
> 
>
> Key: YARN-8081
> URL: https://issues.apache.org/jira/browse/YARN-8081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8052) Move overwriting of service definition during flex to service master

2018-04-09 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8052:

Target Version/s: 3.1.1

> Move overwriting of service definition during flex to service master
> 
>
> Key: YARN-8052
> URL: https://issues.apache.org/jira/browse/YARN-8052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> The overwrite of service definition during flex is done from the 
> ServiceClient. 
> During auto finalization of upgrade, the current service definition gets 
> overwritten as well by the service master. This creates a potential conflict. 
> Need to move the overwrite of service definition during flex to the 
> ServiceClient. 
> Discussed on YARN-8018.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431333#comment-16431333
 ] 

Wangda Tan commented on YARN-8135:
--

I'm currently working on a design doc and a prototype, will share more details 
in the next several days.

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can take human to deep places. B-)
> Compare to other projects:
> !image-2018-04-09-14-35-16-778.png!
> *Notes:*
> * GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> ** XLearning needs few modification to read ClusterSpec from env.
> *References:*
> - TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark
> - TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN
> - Spark Deep Learning (Databricks): 
> https://github.com/databricks/spark-deep-learning
> - XLearning (Qihoo360): https://github.com/Qihoo360/XLearning
> - Kubeflow (Google): https://github.com/kubeflow/kubeflow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-09 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8135:


 Summary: Hadoop {Submarine} Project: Simple and scalable 
deployment of deep learning training / serving jobs on Hadoop
 Key: YARN-8135
 URL: https://issues.apache.org/jira/browse/YARN-8135
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: image-2018-04-09-14-35-16-778.png

Description:

*Goals:*
 - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs on 
YARN.
 - Allow jobs easy access data/models in HDFS and other storages.
 - Can launch services to serve Tensorflow/MXNet models.
 - Support run distributed Tensorflow jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 - Support launch tensorboard if user specified.
 - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

*Why this name?*
 - Because Submarine is the only vehicle can take human to deep places. B-)

Compare to other projects:

!image-2018-04-09-14-35-16-778.png!

*Notes:*

* GPU Isolation of XLearning project is achieved by patched YARN, which is 
different from community’s GPU isolation solution.

** XLearning needs few modification to read ClusterSpec from env.

*References:*

- TensorflowOnSpark (Yahoo): https://github.com/yahoo/TensorFlowOnSpark
- TensorFlowOnYARN (Intel): https://github.com/Intel-bigdata/TensorFlowOnYARN
- Spark Deep Learning (Databricks): 
https://github.com/databricks/spark-deep-learning
- XLearning (Qihoo360): https://github.com/Qihoo360/XLearning
- Kubeflow (Google): https://github.com/kubeflow/kubeflow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7941) Transitive dependencies for component are not resolved

2018-04-09 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi reassigned YARN-7941:


Assignee: Billie Rinaldi

> Transitive dependencies for component are not resolved 
> ---
>
> Key: YARN-7941
> URL: https://issues.apache.org/jira/browse/YARN-7941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-7941.1.patch
>
>
> It is observed that transitive dependencies are not resolved as a result one 
> of the component is started earlier. 
> Ex : In HBase app, 
> master is independent component, 
> regionserver is depends on master.  
> hbaseclient depends on regionserver, 
> but I always see that HBaseClient is launched before regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7941) Transitive dependencies for component are not resolved

2018-04-09 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-7941:
-
Attachment: YARN-7941.1.patch

> Transitive dependencies for component are not resolved 
> ---
>
> Key: YARN-7941
> URL: https://issues.apache.org/jira/browse/YARN-7941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-7941.1.patch
>
>
> It is observed that transitive dependencies are not resolved as a result one 
> of the component is started earlier. 
> Ex : In HBase app, 
> master is independent component, 
> regionserver is depends on master.  
> hbaseclient depends on regionserver, 
> but I always see that HBaseClient is launched before regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7189) Container-executor doesn't remove Docker containers that error out early

2018-04-09 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7189:
--
Affects Version/s: 2.9.0
   2.8.3
   3.0.1
  Description: 
Once the docker run command is executed, the docker container is created unless 
the return code is 125 meaning that the run command itself failed 
(https://docs.docker.com/engine/reference/run/#exit-status). Any error that 
happens after the docker run needs to remove the container during cleanup.

{noformat:title=container-executor.c:launch_docker_container_as_user}
  snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, 
docker_command);

  fprintf(LOGFILE, "Launching docker container...\n");
  FILE* start_docker = popen(docker_command_with_binary, "r");
{noformat}

This is fixed by YARN-5366, which changes how we remove containers. However, 
that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected

  was:
Once the docker run command is executed, the docker container is created unless 
the return code is 125 meaning that the run command itself failed 
(https://docs.docker.com/engine/reference/run/#exit-status). Any error that 
happens after the docker run needs to remove the container during cleanup.

{noformat:title=container-executor.c:launch_docker_container_as_user}
  snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, 
docker_command);

  fprintf(LOGFILE, "Launching docker container...\n");
  FILE* start_docker = popen(docker_command_with_binary, "r");
{noformat}


> Container-executor doesn't remove Docker containers that error out early
> 
>
> Key: YARN-7189
> URL: https://issues.apache.org/jira/browse/YARN-7189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.9.0, 2.8.3, 3.0.1
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>
> Once the docker run command is executed, the docker container is created 
> unless the return code is 125 meaning that the run command itself failed 
> (https://docs.docker.com/engine/reference/run/#exit-status). Any error that 
> happens after the docker run needs to remove the container during cleanup.
> {noformat:title=container-executor.c:launch_docker_container_as_user}
>   snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, 
> docker_command);
>   fprintf(LOGFILE, "Launching docker container...\n");
>   FILE* start_docker = popen(docker_command_with_binary, "r");
> {noformat}
> This is fixed by YARN-5366, which changes how we remove containers. However, 
> that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431317#comment-16431317
 ] 

Shane Kumpf commented on YARN-7667:
---

The lastest patch lgtm.

> Docker Stop grace period should be configurable
> ---
>
> Key: YARN-7667
> URL: https://issues.apache.org/jira/browse/YARN-7667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7667.001.patch, YARN-7667.002.patch, 
> YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch, 
> YARN-7667.006.patch
>
>
> {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never 
> called. So, the stop uses the 10 second default grace period from docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8018) Yarn Service Upgrade: Add support for initiating service upgrade

2018-04-09 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8018:

Target Version/s: 3.1.1

> Yarn Service Upgrade: Add support for initiating service upgrade
> 
>
> Key: YARN-8018
> URL: https://issues.apache.org/jira/browse/YARN-8018
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8018.001.patch, YARN-8018.002.patch, 
> YARN-8018.003.patch, YARN-8018.004.patch, YARN-8018.005.patch, 
> YARN-8018.006.patch, YARN-8018.007.patch
>
>
> Add support for initiating service upgrade which includes the following main 
> changes:
>  # Service API to initiate upgrade
>  # Persist service version on hdfs
>  # Start the upgraded version of service



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance

2018-04-09 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-7939:

Target Version/s: 3.1.1

> Yarn Service Upgrade: add support to upgrade a component instance 
> --
>
> Key: YARN-7939
> URL: https://issues.apache.org/jira/browse/YARN-7939
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-7939.001.patch, YARN-7939.002.patch, 
> YARN-7939.003.patch, YARN-7939.004.patch
>
>
> Yarn core supports in-place upgrade of containers. A yarn service can 
> leverage that to provide in-place upgrade of component instances. Please see 
> YARN-7512 for details.
> Will add support to upgrade a single component instance first and then 
> iteratively add other APIs and features.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431176#comment-16431176
 ] 

Jason Lowe commented on YARN-7667:
--

The TestContainerManager failure is unrelated, see YARN-7145.  The 
TestContainerSchedulerQueuing failure is being tracked by YARN-7700.

+1 lgtm.  I'll wait to make sure Shane is good with the latest patch before 
committing.

> Docker Stop grace period should be configurable
> ---
>
> Key: YARN-7667
> URL: https://issues.apache.org/jira/browse/YARN-7667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7667.001.patch, YARN-7667.002.patch, 
> YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch, 
> YARN-7667.006.patch
>
>
> {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never 
> called. So, the stop uses the 10 second default grace period from docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance

2018-04-09 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-7939:

Attachment: YARN-7939.004.patch

> Yarn Service Upgrade: add support to upgrade a component instance 
> --
>
> Key: YARN-7939
> URL: https://issues.apache.org/jira/browse/YARN-7939
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-7939.001.patch, YARN-7939.002.patch, 
> YARN-7939.003.patch, YARN-7939.004.patch
>
>
> Yarn core supports in-place upgrade of containers. A yarn service can 
> leverage that to provide in-place upgrade of component instances. Please see 
> YARN-7512 for details.
> Will add support to upgrade a single component instance first and then 
> iteratively add other APIs and features.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431153#comment-16431153
 ] 

genericqa commented on YARN-7667:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
12s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 30s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}110m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
|   | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7667 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918219/YARN-7667.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| 

[jira] [Commented] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance

2018-04-09 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431149#comment-16431149
 ] 

Gour Saha commented on YARN-7939:
-

bq. There is no {{NEEDS_UPGRADE}} state at service level, so the json that you 
posted in your example for service level is incorrect.

[~csingh], shouldn't the API submission itself have thrown validation error 
since ServiceState NEEDS_UPGRADE does not even exist?

> Yarn Service Upgrade: add support to upgrade a component instance 
> --
>
> Key: YARN-7939
> URL: https://issues.apache.org/jira/browse/YARN-7939
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-7939.001.patch, YARN-7939.002.patch, 
> YARN-7939.003.patch
>
>
> Yarn core supports in-place upgrade of containers. A yarn service can 
> leverage that to provide in-place upgrade of component instances. Please see 
> YARN-7512 for details.
> Will add support to upgrade a single component instance first and then 
> iteratively add other APIs and features.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-09 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431130#comment-16431130
 ] 

Gour Saha commented on YARN-8133:
-

[~rohithsharma], thank you for the patch. Similar problems are there in all 
these files. Can you fix them as well?

src/site/markdown/yarn-service/Concepts.md
src/site/markdown/yarn-service/QuickStart.md
src/site/markdown/yarn-service/RegistryDNS.md
src/site/markdown/yarn-service/ServiceDiscovery.md

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-8133.01.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7930) Add configuration to initialize RM with configured labels.

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431066#comment-16431066
 ] 

genericqa commented on YARN-7930:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
47s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 267 unchanged - 0 fixed = 268 total (was 267) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 44s{color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m  7s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 85m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
|   | hadoop.yarn.nodelabels.TestCommonNodeLabelsManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7930 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918202/YARN-7930.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2bd04a30ccc2 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e9b9f48 |
| maven | version: Apache Maven 3.3.9 |

[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431063#comment-16431063
 ] 

Wangda Tan commented on YARN-8116:
--

[~csingh], thanks for working on the fix. It's better to include a simple UT to 
avoid regression since this is in a critical path of NM recovery.

> Nodemanager fails with NumberFormatException: For input string: ""
> --
>
> Key: YARN-8116
> URL: https://issues.apache.org/jira/browse/YARN-8116
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8116.001.patch
>
>
> Steps followed.
> 1) Update nodemanager debug delay config
> {code}
> 
>   yarn.nodemanager.delete.debug-delay-sec
>   350
> {code}
> 2) Launch distributed shell application multiple times
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn  jar 
> hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 120" 
> -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> hadoop-yarn-applications-distributedshell-*.jar{code}
> 3) restart NM
> Nodemanager fails to start with below error.
> {code}
> {code:title=NM log}
> 2018-03-23 21:32:14,437 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: 
> true
> 2018-03-23 21:32:14,439 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set 
> as 3600. The logs will be aggregated every 3600 seconds
> 2018-03-23 21:32:14,455 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
> 2018-03-23 21:32:14,458 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(148)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2018-03-23 21:32:14,460 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state 
> INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> 

[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431046#comment-16431046
 ] 

genericqa commented on YARN-8116:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 48s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 72m 11s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8116 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918195/YARN-8116.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5a4b21daf9e5 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 
21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e9b9f48 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20276/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20276/testReport/ |
| Max. process+thread count | 410 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-8100) Support API interface to query cluster attributes and attribute to nodes

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431039#comment-16431039
 ] 

genericqa commented on YARN-8100:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-3409 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
45s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m  
0s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
37s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
13s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
10s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
YARN-3409 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
6s{color} | {color:green} YARN-3409 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 26m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
25s{color} | {color:green} root: The patch generated 0 new + 471 unchanged - 3 
fixed = 471 total (was 474) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 25s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
22s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
19s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 37s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 
39s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
32s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 55s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | 

[jira] [Commented] (YARN-8131) Provice CLI option to DS for publishing entities into sub application

2018-04-09 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431028#comment-16431028
 ] 

Haibo Chen commented on YARN-8131:
--

I'd argue that we'd like to mark subApplication Entity as LimitedPrivate("Tez") 
specifically for this reason. The SubApplicationEntity is designed specifically 
to address Tez's use case where one YARN AM is shared to run multiple user 
queries. Hence, we should reply on Tez or similar use case to test 
SubApplicationEntity API.

> Provice CLI option to DS for publishing entities into sub application
> -
>
> Key: YARN-8131
> URL: https://issues.apache.org/jira/browse/YARN-8131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Major
>
> Post YARN-6936, TimelineV2Client exposes API to publish entities into sub 
> application table. We should add this CLI option in DS so that API can be 
> tested. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431008#comment-16431008
 ] 

Eric Badger commented on YARN-7667:
---

[~shaneku...@gmail.com], shoot that's pretty embarrassing. I updated trunk but 
forgot to rebase. Did a quick rebase and now patch 006 should be patch 004 plus 
the one checkstyle fix. Sorry for the weird patch 005.

> Docker Stop grace period should be configurable
> ---
>
> Key: YARN-7667
> URL: https://issues.apache.org/jira/browse/YARN-7667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7667.001.patch, YARN-7667.002.patch, 
> YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch, 
> YARN-7667.006.patch
>
>
> {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never 
> called. So, the stop uses the 10 second default grace period from docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7667:
--
Attachment: YARN-7667.006.patch

> Docker Stop grace period should be configurable
> ---
>
> Key: YARN-7667
> URL: https://issues.apache.org/jira/browse/YARN-7667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7667.001.patch, YARN-7667.002.patch, 
> YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch, 
> YARN-7667.006.patch
>
>
> {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never 
> called. So, the stop uses the 10 second default grace period from docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430998#comment-16430998
 ] 

Shane Kumpf commented on YARN-7667:
---

Thanks for updating the patch, [~ebadger]. I tested the 004 patch, as the 005 
patch doesn't look right, and 004 looks good to me less that checkstyle issue. 
+1 (non-binding) once checkstyle is addressed.

> Docker Stop grace period should be configurable
> ---
>
> Key: YARN-7667
> URL: https://issues.apache.org/jira/browse/YARN-7667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7667.001.patch, YARN-7667.002.patch, 
> YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch
>
>
> {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never 
> called. So, the stop uses the 10 second default grace period from docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8131) Provice CLI option to DS for publishing entities into sub application

2018-04-09 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430992#comment-16430992
 ] 

Rohith Sharma K S commented on YARN-8131:
-

Primarily DS is used for verifying YARN service features. Though *logically* DS 
doesn't come under sub application concept, there should be way to publish into 
sub app so that this feature can be verified. If we don't want to provide CLI 
option, may be by default DS AM can make user newer API so that data goes into 
both tables.  Otherwise, this API will be untested until Tez or any other 
framework make use of this API.

> Provice CLI option to DS for publishing entities into sub application
> -
>
> Key: YARN-8131
> URL: https://issues.apache.org/jira/browse/YARN-8131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Major
>
> Post YARN-6936, TimelineV2Client exposes API to publish entities into sub 
> application table. We should add this CLI option in DS so that API can be 
> tested. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-09 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-8133:
---

Assignee: Rohith Sharma K S

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-8133.01.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8134) Support specifying node resources in SLS

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430980#comment-16430980
 ] 

genericqa commented on YARN-8134:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 3 
new + 14 unchanged - 0 fixed = 17 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
49s{color} | {color:red} hadoop-tools/hadoop-sls generated 1 new + 0 unchanged 
- 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
35s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
21s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-sls |
|  |  org.apache.hadoop.yarn.sls.SLSRunner.startNM() makes inefficient use of 
keySet iterator instead of entrySet iterator  At SLSRunner.java:keySet iterator 
instead of entrySet iterator  At SLSRunner.java:[line 340] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918196/YARN-8134.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8d557f569f3f 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ac32b35 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20274/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt
 |
| findbugs | 

[jira] [Comment Edited] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance

2018-04-09 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430951#comment-16430951
 ] 

Chandni Singh edited comment on YARN-7939 at 4/9/18 5:58 PM:
-

[~eyang]

Upgraded is either 2 steps when finalization is done automatically, or 3 steps 
when finalization is done manually:

Step 1: Initiate service level upgrade. This requires posting the newer spec. 
Here is an example:
{code:java}
{
"name": "test1",
"version" : "v2",
"state": "UPGRADING",
"components" :
[
{
"name": "sleeper",
"number_of_containers": 2,
"launch_command": "sleep 120",
"resource": {
"cpus": 1,
"memory": "256"
}
}
]
}{code}
This json is the spec json and not the state json. 

There is no {{NEEDS_UPGRADE}} state at service level, so the json that you 
posted in your example for service level is incorrect.

 

Step 2: Trigger upgrade of component or individual component instances.

An example of this request is
{code:java}
{
"state": "UPGRADING",
"component_instance_name": "sleeper-0"
}{code}

The {{NEEDS_UPGRADE}} state is not something user specifies. All the components 
and their instances which have changes (this is figured out when the new spec 
is provided) have there state set as NEEDS_UPGRADE.  This tells the user which 
component or instances have not yet been upgraded. Continuing with the above 
example, once the service upgrade is initiated, {{sleeper}} comp and both its 
instance will be in state {{NEEDS_UPGRADE}}. After triggering the upgrade of 
{{sleeper-0}} it will become {{STABLE}} at some point. However, {{sleeper-1}} 
will still be in {{NEEDS_UPGRADE}} state, which indicates that this instance 
still needs to be upgraded.


was (Author: csingh):
[~eyang]

Upgraded is either 2 steps when finalization is done automatically, or 3 steps 
when finalization is done manually:

Step 1: Initiate service level upgrade. This requires posting the newer spec. 
Here is an example:
{code:java}
{
"name": "test1",
"version" : "v2",
"state": "UPGRADING",
"components" :
[
{
"name": "sleeper",
"number_of_containers": 2,
"launch_command": "sleep 120",
"resource": {
"cpus": 1,
"memory": "256"
}
}
]
}{code}
This json is the spec json and not the state json. 

There is no {{NEEDS_UPGRADE}} state at service level, so the json that you 
posted in your example for service level is incorrect.

 

Step 2: Trigger upgrade of component or individual component instances.

An example of this request is
{code:java}
{
"state": "UPGRADING",
"component_instance_name": "sleeper-0"
}{code}

> Yarn Service Upgrade: add support to upgrade a component instance 
> --
>
> Key: YARN-7939
> URL: https://issues.apache.org/jira/browse/YARN-7939
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-7939.001.patch, YARN-7939.002.patch, 
> YARN-7939.003.patch
>
>
> Yarn core supports in-place upgrade of containers. A yarn service can 
> leverage that to provide in-place upgrade of component instances. Please see 
> YARN-7512 for details.
> Will add support to upgrade a single component instance first and then 
> iteratively add other APIs and features.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance

2018-04-09 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430951#comment-16430951
 ] 

Chandni Singh commented on YARN-7939:
-

[~eyang]

Upgraded is either 2 steps when finalization is done automatically, or 3 steps 
when finalization is done manually:

Step 1: Initiate service level upgrade. This requires posting the newer spec. 
Here is an example:
{code:java}
{
"name": "test1",
"version" : "v2",
"state": "UPGRADING",
"components" :
[
{
"name": "sleeper",
"number_of_containers": 2,
"launch_command": "sleep 120",
"resource": {
"cpus": 1,
"memory": "256"
}
}
]
}{code}
This json is the spec json and not the state json. 

There is no {{NEEDS_UPGRADE}} state at service level, so the json that you 
posted in your example for service level is incorrect.

 

Step 2: Trigger upgrade of component or individual component instances.

An example of this request is
{code:java}
{
"state": "UPGRADING",
"component_instance_name": "sleeper-0"
}{code}

> Yarn Service Upgrade: add support to upgrade a component instance 
> --
>
> Key: YARN-7939
> URL: https://issues.apache.org/jira/browse/YARN-7939
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-7939.001.patch, YARN-7939.002.patch, 
> YARN-7939.003.patch
>
>
> Yarn core supports in-place upgrade of containers. A yarn service can 
> leverage that to provide in-place upgrade of component instances. Please see 
> YARN-7512 for details.
> Will add support to upgrade a single component instance first and then 
> iteratively add other APIs and features.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7941) Transitive dependencies for component are not resolved

2018-04-09 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430939#comment-16430939
 ] 

Billie Rinaldi edited comment on YARN-7941 at 4/9/18 5:35 PM:
--

I think I see the problem. The dependency readiness evaluation is checking 
whether the number of ready containers is less than the number of desired 
containers. But the number of desired containers is not being set until a flex 
event is issued for the component, so we are checking that the number of ready 
containers is not less than 0. I think we can fix this by initializing the 
number of desired containers in the Component constructor.


was (Author: billie.rinaldi):
I think I see the problem. The dependency readiness evaluation is checking 
whether the number of ready containers equals the number of desired containers. 
But the number of desired containers is not being set until a flex event is 
issued for the component, so we are checking that the number of ready 
containers is not less than 0. I think we can fix this by initializing the 
number of desired containers in the Component constructor.

> Transitive dependencies for component are not resolved 
> ---
>
> Key: YARN-7941
> URL: https://issues.apache.org/jira/browse/YARN-7941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Priority: Major
>
> It is observed that transitive dependencies are not resolved as a result one 
> of the component is started earlier. 
> Ex : In HBase app, 
> master is independent component, 
> regionserver is depends on master.  
> hbaseclient depends on regionserver, 
> but I always see that HBaseClient is launched before regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7941) Transitive dependencies for component are not resolved

2018-04-09 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430939#comment-16430939
 ] 

Billie Rinaldi commented on YARN-7941:
--

I think I see the problem. The dependency readiness evaluation is checking 
whether the number of ready containers equals the number of desired containers. 
But the number of desired containers is not being set until a flex event is 
issued for the component, so we are checking that the number of ready 
containers is not less than 0. I think we can fix this by initializing the 
number of desired containers in the Component constructor.

> Transitive dependencies for component are not resolved 
> ---
>
> Key: YARN-7941
> URL: https://issues.apache.org/jira/browse/YARN-7941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Priority: Major
>
> It is observed that transitive dependencies are not resolved as a result one 
> of the component is started earlier. 
> Ex : In HBase app, 
> master is independent component, 
> regionserver is depends on master.  
> hbaseclient depends on regionserver, 
> but I always see that HBaseClient is launched before regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8131) Provice CLI option to DS for publishing entities into sub application

2018-04-09 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430934#comment-16430934
 ] 

Vrushali C edited comment on YARN-8131 at 4/9/18 5:28 PM:
--

I agree that the end user should not be concerned about data going to specific 
tables. The framework should be handling this, like the Tez AM.

In the distributed shell example, we should figure out if there is any data 
that is equivalent to a sub-app use case. If not, we should write a different 
one to test querying/writing out subapp data. It should not be an CLI option.

The flow name, version and flow run id are inputs as CLI options and that is 
different from a sub-app query. If we set the wrong example in DS, then it is 
likely to confuse frameworks about using subapp data. Let's have a good example 
for sub-application data. 

One thing is for sure, the data should be written "logically" by the AM to 
timeline service without caring/knowing where exactly the data ends up in the 
backend. Meaning, if it is a flow level value, it's stored at the flow level. 
If it's an application metric, it's at the app level. The AM need not be 
concerned that there are two tables at the backend, flow & application. All it 
should care for is, that this particular value belongs to flow level, that 
particular value makes sense at the app level and some other third value makes 
sense at the sub-app level. 




was (Author: vrushalic):
I agree that the end user should not be concerned about data going to specific 
tables. The framework should be handling this, like the Tez AM.

In the distributed shell example, we should figure out if there is any data 
that is equivalent to a sub-app use case. If not, we should write a different 
one to test querying/writing out subapp data. It should not be an CLI option.

The flow name, version and flow run id are to CLI options and that is different 
from a sub-app query. If we set the wrong example in DS, then it is likely to 
confuse frameworks about using subapp data. Let's have a good example for 
sub-application data. 

One thing is for sure, the data should be written "logically" by the AM to 
timeline service without caring/knowing where exactly the data ends up in the 
backend. Meaning, if it is a flow level value, it's stored at the flow level. 
If it's an application metric, it's at the app level. The AM need not be 
concerned that there are two tables at the backend, flow & application. All it 
should care for is, that this particular value belongs to flow level, that 
particular value makes sense at the app level and some other third value makes 
sense at the sub-app level. 



> Provice CLI option to DS for publishing entities into sub application
> -
>
> Key: YARN-8131
> URL: https://issues.apache.org/jira/browse/YARN-8131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Major
>
> Post YARN-6936, TimelineV2Client exposes API to publish entities into sub 
> application table. We should add this CLI option in DS so that API can be 
> tested. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8131) Provice CLI option to DS for publishing entities into sub application

2018-04-09 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430934#comment-16430934
 ] 

Vrushali C commented on YARN-8131:
--

I agree that the end user should not be concerned about data going to specific 
tables. The framework should be handling this, like the Tez AM.

In the distributed shell example, we should figure out if there is any data 
that is equivalent to a sub-app use case. If not, we should write a different 
one to test querying/writing out subapp data. It should not be an CLI option.

The flow name, version and flow run id are to CLI options and that is different 
from a sub-app query. If we set the wrong example in DS, then it is likely to 
confuse frameworks about using subapp data. Let's have a good example for 
sub-application data. 

One thing is for sure, the data should be written "logically" by the AM to 
timeline service without caring/knowing where exactly the data ends up in the 
backend. Meaning, if it is a flow level value, it's stored at the flow level. 
If it's an application metric, it's at the app level. The AM need not be 
concerned that there are two tables at the backend, flow & application. All it 
should care for is, that this particular value belongs to flow level, that 
particular value makes sense at the app level and some other third value makes 
sense at the sub-app level. 



> Provice CLI option to DS for publishing entities into sub application
> -
>
> Key: YARN-8131
> URL: https://issues.apache.org/jira/browse/YARN-8131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Major
>
> Post YARN-6936, TimelineV2Client exposes API to publish entities into sub 
> application table. We should add this CLI option in DS so that API can be 
> tested. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7930) Add configuration to initialize RM with configured labels.

2018-04-09 Thread Abhishek Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-7930:

Attachment: YARN-7930.003.patch

> Add configuration to initialize RM with configured labels.
> --
>
> Key: YARN-7930
> URL: https://issues.apache.org/jira/browse/YARN-7930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-7930.001.patch, YARN-7930.002.patch, 
> YARN-7930.003.patch
>
>
> At present, the only way to create labels is using admin API. Sometimes, 
> there is a requirement to start the cluster with pre-configured node labels. 
> This Jira introduces yarn configurations to start RM with predefined node 
> labels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430886#comment-16430886
 ] 

genericqa commented on YARN-8133:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
36m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8133 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918182/YARN-8133.01.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 49d1b143969e 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ac32b35 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 341 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20272/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-8133.01.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7781) Update YARN-Services-Examples.md to be in sync with the latest code

2018-04-09 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430869#comment-16430869
 ] 

Gour Saha commented on YARN-7781:
-

[~jianhe] is it ok if I take over this jira and make the final few necessary 
changes?

> Update YARN-Services-Examples.md to be in sync with the latest code
> ---
>
> Key: YARN-7781
> URL: https://issues.apache.org/jira/browse/YARN-7781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Jian He
>Priority: Major
> Attachments: YARN-7781.01.patch, YARN-7781.02.patch, 
> YARN-7781.03.patch
>
>
> Update YARN-Services-Examples.md to make the following additions/changes:
> 1. Add an additional URL and PUT Request JSON to support flex:
> Update to flex up/down the no of containers (instances) of a component of a 
> service
> PUT URL – http://localhost:8088/app/v1/services/hello-world
> PUT Request JSON
> {code}
> {
>   "components" : [ {
> "name" : "hello",
> "number_of_containers" : 3
>   } ]
> }
> {code}
> 2. Modify all occurrences of /ws/ to /app/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-09 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430862#comment-16430862
 ] 

Chandni Singh commented on YARN-8116:
-

Empty retry times list was being saved in the NMStore which caused this 
exception. Have a simple fix for it. 
[~leftnoteasy] could you please review?

> Nodemanager fails with NumberFormatException: For input string: ""
> --
>
> Key: YARN-8116
> URL: https://issues.apache.org/jira/browse/YARN-8116
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8116.001.patch
>
>
> Steps followed.
> 1) Update nodemanager debug delay config
> {code}
> 
>   yarn.nodemanager.delete.debug-delay-sec
>   350
> {code}
> 2) Launch distributed shell application multiple times
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn  jar 
> hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 120" 
> -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> hadoop-yarn-applications-distributedshell-*.jar{code}
> 3) restart NM
> Nodemanager fails to start with below error.
> {code}
> {code:title=NM log}
> 2018-03-23 21:32:14,437 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: 
> true
> 2018-03-23 21:32:14,439 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set 
> as 3600. The logs will be aggregated every 3600 seconds
> 2018-03-23 21:32:14,455 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
> 2018-03-23 21:32:14,458 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(148)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2018-03-23 21:32:14,460 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state 
> INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> 

[jira] [Updated] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8133:
-
Priority: Blocker  (was: Major)

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-8133.01.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8133:
-
Target Version/s: 3.1.1

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-8133.01.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8134) Support specifying node resources in SLS

2018-04-09 Thread Abhishek Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8134:

Attachment: YARN-8134.patch

> Support specifying node resources in SLS
> 
>
> Key: YARN-8134
> URL: https://issues.apache.org/jira/browse/YARN-8134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8134.patch
>
>
> At present, all nodes have same resources in SLS. We need to add capability 
> to add different resources to different nodes in SLS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8134) Support specifying node resources in SLS

2018-04-09 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-8134:
---

 Summary: Support specifying node resources in SLS
 Key: YARN-8134
 URL: https://issues.apache.org/jira/browse/YARN-8134
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


At present, all nodes have same resources in SLS. We need to add capability to 
add different resources to different nodes in SLS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-09 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8116:

Attachment: YARN-8116.001.patch

> Nodemanager fails with NumberFormatException: For input string: ""
> --
>
> Key: YARN-8116
> URL: https://issues.apache.org/jira/browse/YARN-8116
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8116.001.patch
>
>
> Steps followed.
> 1) Update nodemanager debug delay config
> {code}
> 
>   yarn.nodemanager.delete.debug-delay-sec
>   350
> {code}
> 2) Launch distributed shell application multiple times
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn  jar 
> hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 120" 
> -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> hadoop-yarn-applications-distributedshell-*.jar{code}
> 3) restart NM
> Nodemanager fails to start with below error.
> {code}
> {code:title=NM log}
> 2018-03-23 21:32:14,437 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: 
> true
> 2018-03-23 21:32:14,439 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set 
> as 3600. The logs will be aggregated every 3600 seconds
> 2018-03-23 21:32:14,455 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
> 2018-03-23 21:32:14,458 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(148)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2018-03-23 21:32:14,460 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state 
> INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> 

[jira] [Updated] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7667:
--
Attachment: YARN-7667.005.patch

> Docker Stop grace period should be configurable
> ---
>
> Key: YARN-7667
> URL: https://issues.apache.org/jira/browse/YARN-7667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7667.001.patch, YARN-7667.002.patch, 
> YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch
>
>
> {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never 
> called. So, the stop uses the 10 second default grace period from docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430842#comment-16430842
 ] 

Eric Badger commented on YARN-7667:
---

Fixing checkstyle

> Docker Stop grace period should be configurable
> ---
>
> Key: YARN-7667
> URL: https://issues.apache.org/jira/browse/YARN-7667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7667.001.patch, YARN-7667.002.patch, 
> YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch
>
>
> {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never 
> called. So, the stop uses the 10 second default grace period from docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8131) Provice CLI option to DS for publishing entities into sub application

2018-04-09 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430839#comment-16430839
 ] 

Rohith Sharma K S commented on YARN-8131:
-

We need to provide an option in DS client CLI, unlike -flow_name, -flow_version 
and -flow_run_id. Based on CLI option, DS application master  can decide 
whether to write into sub application or not. This is to test new API because 
these new API is not integrated anywhere else. 

> Provice CLI option to DS for publishing entities into sub application
> -
>
> Key: YARN-8131
> URL: https://issues.apache.org/jira/browse/YARN-8131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Major
>
> Post YARN-6936, TimelineV2Client exposes API to publish entities into sub 
> application table. We should add this CLI option in DS so that API can be 
> tested. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8131) Provice CLI option to DS for publishing entities into sub application

2018-04-09 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430813#comment-16430813
 ] 

Haibo Chen commented on YARN-8131:
--

[~rohithsharma] I don't think we should let the end user decide whether 
entities are posted into sub application or not.

The framework, DistributeShell in this case, should decide.  The end user, or 
user of CLI cares about data, not how it is stored in ATSv2 by DS. [~vrushalic] 
What's your take on this?

> Provice CLI option to DS for publishing entities into sub application
> -
>
> Key: YARN-8131
> URL: https://issues.apache.org/jira/browse/YARN-8131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Major
>
> Post YARN-6936, TimelineV2Client exposes API to publish entities into sub 
> application table. We should add this CLI option in DS so that API can be 
> tested. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-09 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh reassigned YARN-8116:
---

Assignee: Chandni Singh

> Nodemanager fails with NumberFormatException: For input string: ""
> --
>
> Key: YARN-8116
> URL: https://issues.apache.org/jira/browse/YARN-8116
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Critical
>
> Steps followed.
> 1) Update nodemanager debug delay config
> {code}
> 
>   yarn.nodemanager.delete.debug-delay-sec
>   350
> {code}
> 2) Launch distributed shell application multiple times
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn  jar 
> hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 120" 
> -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> hadoop-yarn-applications-distributedshell-*.jar{code}
> 3) restart NM
> Nodemanager fails to start with below error.
> {code}
> {code:title=NM log}
> 2018-03-23 21:32:14,437 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: 
> true
> 2018-03-23 21:32:14,439 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set 
> as 3600. The logs will be aggregated every 3600 seconds
> 2018-03-23 21:32:14,455 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
> 2018-03-23 21:32:14,458 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(148)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2018-03-23 21:32:14,460 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state 
> INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> 

[jira] [Commented] (YARN-7574) Add support for Node Labels on Auto Created Leaf Queue Template

2018-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430802#comment-16430802
 ] 

Hudson commented on YARN-7574:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13942 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13942/])
YARN-7574. Add support for Node Labels on Auto Created Leaf Queue (sunilg: rev 
821b0de4c59156d4a65112de03ba3e7e1c88e309)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ManagedParentQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/PendingAskUpdateResult.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/queuemanagement/GuaranteedOrZeroCapacityOverTimePolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerAutoCreatedQueueBase.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueManagementDynamicEditPolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AutoCreatedLeafQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Allocation.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerAutoQueueCreation.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AutoCreatedQueueManagementPolicy.java


> Add support for Node Labels on Auto Created Leaf Queue Template
> ---
>
> Key: YARN-7574
> URL: https://issues.apache.org/jira/browse/YARN-7574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7574.1.patch, YARN-7574.10.patch, 
> YARN-7574.11.patch, YARN-7574.12.patch, YARN-7574.2.patch, YARN-7574.3.patch, 
> YARN-7574.4.patch, YARN-7574.5.patch, YARN-7574.6.patch, YARN-7574.7.patch, 
> YARN-7574.8.patch, YARN-7574.9.patch
>
>
> YARN-7473 adds support for auto created leaf queues to inherit node labels 
> capacities from parent queues. Howebver there is no support for leaf queue 
> template to allow different configured capacities for different node labels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Updated] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-09 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-8133:

Attachment: YARN-8133.01.patch

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8133.01.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-09 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-8133:
---

 Summary: Doc link broken for yarn-service from overview page.
 Key: YARN-8133
 URL: https://issues.apache.org/jira/browse/YARN-8133
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Rohith Sharma K S


I see that documentation link broken from overview page. 

Any link clicking from 
http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
 page causing an error. 

It looks like Overview page, redirecting with .md page which doesn't exist. It 
should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430788#comment-16430788
 ] 

genericqa commented on YARN-7667:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 237 unchanged - 0 fixed = 238 total (was 237) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 17s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
31s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}111m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.api.impl.TestTimelineClientV2Impl |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7667 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918160/YARN-7667.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 

[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-04-09 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430726#comment-16430726
 ] 

Eric Badger commented on YARN-7221:
---

bq. Hi Eric Badger Jason Lowe, do we agree on the last change to check 
submitting user for sudo privileges instead of 
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user?
Yep, I agree with that

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-04-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430715#comment-16430715
 ] 

Jason Lowe commented on YARN-7221:
--

bq. do we agree on the last change to check submitting user for sudo privileges 
instead of yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user?

Yes, that sounds like an appropriate change.

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7996) Allow user supplied Docker client configurations with YARN native services

2018-04-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430708#comment-16430708
 ] 

Jason Lowe commented on YARN-7996:
--

Thanks for the patch!  Looks good to me overall.  I agree with Billie's comment 
about clarifying what's expected in the config field.

Nit: The following code should have a debug log enabled check at the front of 
the conditional
{code}
  if (tokens != null && tokens.length != 0) {
for (Token token : tokens) {
  LOG.debug("Got DT: " + token);
}
  }
{code}

It's a little odd for validateDockerClientConfiguration to go through the 
motions to build a full URI but then limit that URI check to a particular 
filesystem.  Is that intentional, or should it be calling Path#getFileSystem 
instead of assuming it should use fs.getFileSystem()?


> Allow user supplied Docker client configurations with YARN native services
> --
>
> Key: YARN-7996
> URL: https://issues.apache.org/jira/browse/YARN-7996
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7996.001.patch, YARN-7996.002.patch, 
> YARN-7996.003.patch, YARN-7996.004.patch
>
>
> YARN-5428 added support to distributed shell for supplying a Docker client 
> configuration at application submission time. The auth tokens within the 
> client configuration are then used to pull images from private Docker 
> repositories/registries. Add the same support to the YARN Native Services 
> framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-04-09 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430677#comment-16430677
 ] 

Billie Rinaldi commented on YARN-7962:
--

[~wilfreds], any further comments about patch 3, given that try/finally is the 
recommended best practice?

> Race Condition When Stopping DelegationTokenRenewer
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, 
> YARN-7962.4.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430583#comment-16430583
 ] 

Eric Badger commented on YARN-7667:
---

[~shaneku...@gmail.com], thanks for the review! 

bq. Is there a reason not to set the value for 
yarn.nodemanager.runtime.linux.docker.stop.grace-period in yarn-default.xml to 
10?
Not particularly. I was just leaving it to be set by the user while using the 
default value set in YarnConfiguration. I'm not sure on the convention there, 
so I went ahead and followed your comment and updated the default in 
yarn-default.xml to 10. 

bq. I don't think the new DockerStopCommand constructor is necessary, new 
DockerStopCommand(containerId).setGracePeriod(dockerStopGracePeriod) would 
achieve the same.
Fair enough. I removed the additional constructor and replaced the invocations 
with {{DockerStopCommand(containerId).setGracePeriod(dockerStopGracePeriod)}}

> Docker Stop grace period should be configurable
> ---
>
> Key: YARN-7667
> URL: https://issues.apache.org/jira/browse/YARN-7667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7667.001.patch, YARN-7667.002.patch, 
> YARN-7667.003.patch, YARN-7667.004.patch
>
>
> {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never 
> called. So, the stop uses the 10 second default grace period from docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7667) Docker Stop grace period should be configurable

2018-04-09 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7667:
--
Attachment: YARN-7667.004.patch

> Docker Stop grace period should be configurable
> ---
>
> Key: YARN-7667
> URL: https://issues.apache.org/jira/browse/YARN-7667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7667.001.patch, YARN-7667.002.patch, 
> YARN-7667.003.patch, YARN-7667.004.patch
>
>
> {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never 
> called. So, the stop uses the 10 second default grace period from docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8132) Final Status of applications shown as UNDEFINED in ATS app queries

2018-04-09 Thread Charan Hebri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charan Hebri updated YARN-8132:
---
Description: 
Final Status is shown as UNDEFINED for applications that are KILLED/FAILED. A 
sample request/response with INFO field for an application,
{noformat}
2018-04-09 13:10:02,126 INFO  reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:getApp(1693)) - Received URL 
/ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO from user hrt_qa
2018-04-09 13:10:02,156 INFO  reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:getApp(1716)) - Processed URL 
/ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO (Took 30 
ms.){noformat}
{noformat}
{
  "metrics": [],
  "events": [],
  "createdtime": 1523263360719,
  "idprefix": 0,
  "id": "application_1523259757659_0003",
  "type": "YARN_APPLICATION",
  "info": {
"YARN_APPLICATION_CALLER_CONTEXT": "CLI",
"YARN_APPLICATION_DIAGNOSTICS_INFO": "Application 
application_1523259757659_0003 was killed by user xxx_xx at XXX.XXX.XXX.XXX",
"YARN_APPLICATION_FINAL_STATUS": "UNDEFINED",
"YARN_APPLICATION_NAME": "Sleep job",
"YARN_APPLICATION_USER": "hrt_qa",
"YARN_APPLICATION_UNMANAGED_APPLICATION": false,
"FROM_ID": 
"yarn-cluster!hrt_qa!test_flow!1523263360719!application_1523259757659_0003",
"UID": "yarn-cluster!application_1523259757659_0003",
"YARN_APPLICATION_VIEW_ACLS": " ",
"YARN_APPLICATION_SUBMITTED_TIME": 1523263360718,
"YARN_AM_CONTAINER_LAUNCH_COMMAND": [
  "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir= -Dyarn.app.container.log.filesize=0 
-Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog 
-Dhdp.version=3.0.0.0-1163 -Xmx819m -Dhdp.version=3.0.0.0-1163 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/stdout 
2>/stderr "
],
"YARN_APPLICATION_QUEUE": "default",
"YARN_APPLICATION_TYPE": "MAPREDUCE",
"YARN_APPLICATION_PRIORITY": 0,
"YARN_APPLICATION_LATEST_APP_ATTEMPT": 
"appattempt_1523259757659_0003_01",
"YARN_APPLICATION_TAGS": [
  "timeline_flow_name_tag:test_flow"
],
"YARN_APPLICATION_STATE": "KILLED"
  },
  "configs": {},
  "isrelatedto": {},
  "relatesto": {}
}{noformat}
This is different to what the Resource Manager reports. For KILLED applications 
the final status is KILLED and for FAILED applications it is FAILED. This 
behavior is seen in ATSv2 as well as older versions of ATS. 

  was:
Final Status is shown as UNDEFINED for applications that are KILLED/FAILED. A 
sample request/response with INFO field for an application,
{noformat}
2018-04-09 13:10:02,126 INFO  reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:getApp(1693)) - Received URL 
/ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO from user hrt_qa

2018-04-09 13:10:02,156 INFO  reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:getApp(1716)) - Processed URL 
/ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO (Took 30 
ms.){noformat}
{noformat}
{
  "metrics": [],
  "events": [],
  "createdtime": 1523263360719,
  "idprefix": 0,
  "id": "application_1523259757659_0003",
  "type": "YARN_APPLICATION",
  "info": {
"YARN_APPLICATION_CALLER_CONTEXT": "CLI",
"YARN_APPLICATION_DIAGNOSTICS_INFO": "Application 
application_1523259757659_0003 was killed by user xxx_xx at XXX.XXX.XXX.XXX",
"YARN_APPLICATION_FINAL_STATUS": "UNDEFINED",
"YARN_APPLICATION_NAME": "Sleep job",
"YARN_APPLICATION_USER": "hrt_qa",
"YARN_APPLICATION_UNMANAGED_APPLICATION": false,
"FROM_ID": 
"yarn-cluster!hrt_qa!test_flow!1523263360719!application_1523259757659_0003",
"UID": "yarn-cluster!application_1523259757659_0003",
"YARN_APPLICATION_VIEW_ACLS": " ",
"YARN_APPLICATION_SUBMITTED_TIME": 1523263360718,
"YARN_AM_CONTAINER_LAUNCH_COMMAND": [
  "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir= -Dyarn.app.container.log.filesize=0 
-Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog 
-Dhdp.version=3.0.0.0-1163 -Xmx819m -Dhdp.version=3.0.0.0-1163 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/stdout 
2>/stderr "
],
"YARN_APPLICATION_QUEUE": "default",
"YARN_APPLICATION_TYPE": "MAPREDUCE",
"YARN_APPLICATION_PRIORITY": 0,
"YARN_APPLICATION_LATEST_APP_ATTEMPT": 
"appattempt_1523259757659_0003_01",
"YARN_APPLICATION_TAGS": [
  "timeline_flow_name_tag:test_flow"
],
"YARN_APPLICATION_STATE": "KILLED"
  },
  "configs": {},
  "isrelatedto": {},
  "relatesto": {}
}{noformat}
This is different to what the Resource Manager reports. For KILLED applications 
the final status is KILLED and for FAILED applications it is FAILED. This 
behavior is seen in ATSv2 as well as older versions of ATS. 


> Final 

[jira] [Created] (YARN-8132) Final Status of applications shown as UNDEFINED in ATS app queries

2018-04-09 Thread Charan Hebri (JIRA)
Charan Hebri created YARN-8132:
--

 Summary: Final Status of applications shown as UNDEFINED in ATS 
app queries
 Key: YARN-8132
 URL: https://issues.apache.org/jira/browse/YARN-8132
 Project: Hadoop YARN
  Issue Type: Bug
  Components: ATSv2, timelineservice
Reporter: Charan Hebri


Final Status is shown as UNDEFINED for applications that are KILLED/FAILED. A 
sample request/response with INFO field for an application,
{noformat}
2018-04-09 13:10:02,126 INFO  reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:getApp(1693)) - Received URL 
/ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO from user hrt_qa

2018-04-09 13:10:02,156 INFO  reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:getApp(1716)) - Processed URL 
/ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO (Took 30 
ms.){noformat}
{noformat}
{
  "metrics": [],
  "events": [],
  "createdtime": 1523263360719,
  "idprefix": 0,
  "id": "application_1523259757659_0003",
  "type": "YARN_APPLICATION",
  "info": {
"YARN_APPLICATION_CALLER_CONTEXT": "CLI",
"YARN_APPLICATION_DIAGNOSTICS_INFO": "Application 
application_1523259757659_0003 was killed by user xxx_xx at XXX.XXX.XXX.XXX",
"YARN_APPLICATION_FINAL_STATUS": "UNDEFINED",
"YARN_APPLICATION_NAME": "Sleep job",
"YARN_APPLICATION_USER": "hrt_qa",
"YARN_APPLICATION_UNMANAGED_APPLICATION": false,
"FROM_ID": 
"yarn-cluster!hrt_qa!test_flow!1523263360719!application_1523259757659_0003",
"UID": "yarn-cluster!application_1523259757659_0003",
"YARN_APPLICATION_VIEW_ACLS": " ",
"YARN_APPLICATION_SUBMITTED_TIME": 1523263360718,
"YARN_AM_CONTAINER_LAUNCH_COMMAND": [
  "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir= -Dyarn.app.container.log.filesize=0 
-Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog 
-Dhdp.version=3.0.0.0-1163 -Xmx819m -Dhdp.version=3.0.0.0-1163 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/stdout 
2>/stderr "
],
"YARN_APPLICATION_QUEUE": "default",
"YARN_APPLICATION_TYPE": "MAPREDUCE",
"YARN_APPLICATION_PRIORITY": 0,
"YARN_APPLICATION_LATEST_APP_ATTEMPT": 
"appattempt_1523259757659_0003_01",
"YARN_APPLICATION_TAGS": [
  "timeline_flow_name_tag:test_flow"
],
"YARN_APPLICATION_STATE": "KILLED"
  },
  "configs": {},
  "isrelatedto": {},
  "relatesto": {}
}{noformat}
This is different to what the Resource Manager reports. For KILLED applications 
the final status is KILLED and for FAILED applications it is FAILED. This 
behavior is seen in ATSv2 as well as older versions of ATS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >