[jira] [Commented] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433441#comment-16433441
 ] 

genericqa commented on YARN-8127:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
24s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
25s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 25s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
25s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m  
4s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
26s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 25s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 48m 23s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8127 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918508/YARN-8127.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d2cc043fa1c6 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7c9cdad |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/20297/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-YARN-Build/20297/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| javac | 

[jira] [Commented] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433405#comment-16433405
 ] 

Eric Yang commented on YARN-8142:
-

[~cheersyang] If AM is hanging, then it is unlikely to gracefully terminate by 
SIGTERM.  I think SIGKILL would be the right way to handle this, and let RM 
restart it.

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Billie Rinaldi
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7402) Federation V2: Global Optimizations

2018-04-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433404#comment-16433404
 ] 

Wangda Tan commented on YARN-7402:
--

[~curino] / [~subru], thanks for working on this improvement, is there any 
design/explanation doc so we can understand the overall idea and scope?

> Federation V2: Global Optimizations
> ---
>
> Key: YARN-7402
> URL: https://issues.apache.org/jira/browse/YARN-7402
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Major
>
> YARN Federation today requires manual configuration of queues within each 
> sub-cluster, and each RM operates "in isolation". This has few issues:
> # Preemption is computed locally (and might far exceed the global need)
> # Jobs within a queue are forced to consume their resources "evenly" based on 
> queue mapping
> This umbrella JIRA tracks a new feature that leverages the 
> FederationStateStore as a synchronization mechanism among RMs, and allows for 
> allocation and preemption decisions to be based on a (close to up-to-date) 
> global view of the cluster allocation and demand. The JIRA also tracks 
> algorithms to automatically generate policies for Router and AMRMProxy to 
> shape the traffic to each sub-cluster, and general "maintenance" of the 
> FederationStateStore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8127:
---
Attachment: YARN-8127.003.patch

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8127.001.patch, YARN-8127.002.patch, 
> YARN-8127.003.patch
>
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8127:
---
Attachment: (was: YARN-8127.003.patch)

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8127.001.patch, YARN-8127.002.patch
>
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8127:
---
Attachment: YARN-8127.003.patch

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8127.001.patch, YARN-8127.002.patch, 
> YARN-8127.003.patch
>
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8127:
---
Attachment: (was: YARN-8127.003.patch)

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8127.001.patch, YARN-8127.002.patch
>
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433384#comment-16433384
 ] 

genericqa commented on YARN-8127:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 67m 
15s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}118m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8127 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918497/YARN-8127.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 29aad0064bc1 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d919eb6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20295/testReport/ |
| Max. process+thread count | 837 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20295/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Resource leak when async scheduling is enabled
> 

[jira] [Commented] (YARN-7142) Support placement policy in yarn native services

2018-04-10 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433376#comment-16433376
 ] 

Weiwei Yang commented on YARN-7142:
---

Hi [~leftnoteasy]/[~gsaha]

Is the patch for 3.1 branch ready to commit? I see some checkstyle issues but 
looks like those are already included in trunk so probably doesn't matter? 
Please take a look, thanks.

> Support placement policy in yarn native services
> 
>
> Key: YARN-7142
> URL: https://issues.apache.org/jira/browse/YARN-7142
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7142-branch-3.1.004.patch, YARN-7142.001.patch, 
> YARN-7142.002.patch, YARN-7142.003.patch, YARN-7142.004.patch
>
>
> Placement policy exists in the API but is not implemented yet.
> I have filed YARN-8074 to move the composite constraints implementation out 
> of this phase-1 implementation of placement policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7941) Transitive dependencies for component are not resolved

2018-04-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433373#comment-16433373
 ] 

Hudson commented on YARN-7941:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13964 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13964/])
YARN-7941. Transitive dependencies for component are not resolved. 
(rohithsharmaks: rev c0487110990958fa985d273eb178bdf76002cf3a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestYarnNativeServices.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/Component.java


> Transitive dependencies for component are not resolved 
> ---
>
> Key: YARN-7941
> URL: https://issues.apache.org/jira/browse/YARN-7941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Billie Rinaldi
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7941.1.patch
>
>
> It is observed that transitive dependencies are not resolved as a result one 
> of the component is started earlier. 
> Ex : In HBase app, 
> master is independent component, 
> regionserver is depends on master.  
> hbaseclient depends on regionserver, 
> but I always see that HBaseClient is launched before regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433368#comment-16433368
 ] 

Tao Yang edited comment on YARN-8127 at 4/11/18 4:00 AM:
-

Thanks [~cheersyang] for your suggestion.

LGTM, Attached v3 patch to simplify the check logic.


was (Author: tao yang):
Attached v3 patch to simplify the check logic.

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8127.001.patch, YARN-8127.002.patch, 
> YARN-8127.003.patch
>
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433368#comment-16433368
 ] 

Tao Yang commented on YARN-8127:


Attached v3 patch to simplify the check logic.

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8127.001.patch, YARN-8127.002.patch, 
> YARN-8127.003.patch
>
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8127:
---
Attachment: YARN-8127.003.patch

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8127.001.patch, YARN-8127.002.patch, 
> YARN-8127.003.patch
>
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433367#comment-16433367
 ] 

Weiwei Yang commented on YARN-8127:
---

Hi [~Tao Yang]

Thanks for the patch, while reading the commit logic, I think this check can be 
moved to \{{FiCaSchedulerApp#commonCheckContainerAllocation}}, when a proposal 
made an allocation for a reserved container, we need to make sure the node has 
reserved container at that moment (in case another same proposal was already 
committed). Could you please take a look?

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8127.001.patch, YARN-8127.002.patch
>
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8138) No containers pre-empted from another queue when using node labels

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433362#comment-16433362
 ] 

genericqa commented on YARN-8138:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 9 new + 17 unchanged - 0 fixed = 26 total (was 17) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 47s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}140m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoCreatedQueuePreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8138 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918490/YARN-8138.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fbddb0aa800e 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d919eb6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20294/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Commented] (YARN-7941) Transitive dependencies for component are not resolved

2018-04-10 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433360#comment-16433360
 ] 

Rohith Sharma K S commented on YARN-7941:
-

committing shortly

> Transitive dependencies for component are not resolved 
> ---
>
> Key: YARN-7941
> URL: https://issues.apache.org/jira/browse/YARN-7941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-7941.1.patch
>
>
> It is observed that transitive dependencies are not resolved as a result one 
> of the component is started earlier. 
> Ex : In HBase app, 
> master is independent component, 
> regionserver is depends on master.  
> hbaseclient depends on regionserver, 
> but I always see that HBaseClient is launched before regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-10 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433340#comment-16433340
 ] 

Weiwei Yang commented on YARN-8142:
---

I am thinking another scenario, what if an app's AM hangs. User wants to 
restart the AM without restarting the entire app, would it be reasonable to 
allow an AM be killed then automatically restarted in another instance? To 
terminate the application (especially for a long-running service), we need to 
call a stop command instead of killing an AM directly. Does that make sense?

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Billie Rinaldi
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8143) Improve log message when Capacity Scheduler request allocation on node

2018-04-10 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1646#comment-1646
 ] 

Weiwei Yang commented on YARN-8143:
---

Hi [~Zian Chen]

Thanks for filing this ticket. Any proposal?

Since this is about the log refinement, I have reduced the priority.

> Improve log message when Capacity Scheduler request allocation on node
> --
>
> Key: YARN-8143
> URL: https://issues.apache.org/jira/browse/YARN-8143
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Critical
>
> When scheduler request allocates container on the node with reserved 
> containers on it, this log message will print very frequently which needs to 
> be improved with more condition checks.
> {code:java}
> 2018-02-02 11:41:13,105 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:13,115 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> ctr-e137-1514896590304-52728-01-07.hwx.site:25454
> 2018-02-02 11:41:13,115 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container  application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8143) Improve log message when Capacity Scheduler request allocation on node

2018-04-10 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8143:
--
Priority: Minor  (was: Critical)

> Improve log message when Capacity Scheduler request allocation on node
> --
>
> Key: YARN-8143
> URL: https://issues.apache.org/jira/browse/YARN-8143
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Minor
>
> When scheduler request allocates container on the node with reserved 
> containers on it, this log message will print very frequently which needs to 
> be improved with more condition checks.
> {code:java}
> 2018-02-02 11:41:13,105 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:13,115 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> ctr-e137-1514896590304-52728-01-07.hwx.site:25454
> 2018-02-02 11:41:13,115 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container  application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433317#comment-16433317
 ] 

Tao Yang commented on YARN-8127:


Attached v2 patch to fix UT and the check-style error

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8127.001.patch, YARN-8127.002.patch
>
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8127:
---
Attachment: YARN-8127.002.patch

> Resource leak when async scheduling is enabled
> --
>
> Key: YARN-8127
> URL: https://issues.apache.org/jira/browse/YARN-8127
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8127.001.patch, YARN-8127.002.patch
>
>
> Brief steps to reproduce
>  # Enable async scheduling, 5 threads
>  # Submit a lot of jobs trying to exhaust cluster resource
>  # After a while, observed NM allocated resource is more than resource 
> requested by allocated containers
> Looks like the commit phase is not sync handling reserved containers, causing 
> some proposal incorrectly accepted, subsequently resource was deducted 
> multiple times for a container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens

2018-04-10 Thread Chun Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433277#comment-16433277
 ] 

Chun Chen commented on YARN-2674:
-

[~shaneku...@gmail.com] Please catch on the work here. I no longer work on yarn 
any more.

> Distributed shell AM may re-launch containers if RM work preserving restart 
> happens
> ---
>
> Key: YARN-2674
> URL: https://issues.apache.org/jira/browse/YARN-2674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, resourcemanager
>Reporter: Chun Chen
>Assignee: Chun Chen
>Priority: Major
>  Labels: oct16-easy
> Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, 
> YARN-2674.4.patch, YARN-2674.5.patch
>
>
> Currently, if RM work preserving restart happens while distributed shell is 
> running, distribute shell AM may re-launch all the containers, including 
> new/running/complete. We must make sure it won't re-launch the 
> running/complete containers.
> We need to remove allocated containers from 
> AMRMClientImpl#remoteRequestsTable once AM receive them from RM. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8138) No containers pre-empted from another queue when using node labels

2018-04-10 Thread Zian Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433270#comment-16433270
 ] 

Zian Chen commented on YARN-8138:
-

Upload a patch to add new UT to TestCapacitySchedulerSurgicalPreemption

> No containers pre-empted from another queue when using node labels
> --
>
> Key: YARN-8138
> URL: https://issues.apache.org/jira/browse/YARN-8138
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Zian Chen
>Priority: Blocker
> Attachments: YARN-8138.001.patch
>
>
> There seems to be an issue with pre-emption when using node labels with queue 
> priority.
> Test configuration:
> queue A (capacity=50, priority=1)
> queue B (capacity=50, priority=2)
> both have accessible-node-labels set to x
> A.accessible-node-labels.x.capacity = 50
> B.accessible-node-labels.x.capacity = 50
> Along with this pre-emption related properties have been set.
> Test steps:
>  - Set NM memory = 6000MB and containerMemory = 750MB
>  - Submit an application A1 to B, with am-container = container = 
> (6000-750-1500), no. of containers = 2
>  - Submit an application A2 to A, with am-container = 750, container = 1500, 
> no of containers = (NUM_NM-1)
>  - Kill application A1
>  - Submit an application A3 to B with am-container=container=5000, no. of 
> containers=3
>  - Expectation is that containers are pre-empted from application A2 to A3 
> but there is no container pre-emption happening
> Container pre-emption is stuck with the message in the RM log,
> {noformat}
> 2018-02-02 11:41:36,974 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,994 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,995 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster={noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8138) No containers pre-empted from another queue when using node labels

2018-04-10 Thread Zian Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-8138:

Attachment: YARN-8138.001.patch

> No containers pre-empted from another queue when using node labels
> --
>
> Key: YARN-8138
> URL: https://issues.apache.org/jira/browse/YARN-8138
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Zian Chen
>Priority: Blocker
> Attachments: YARN-8138.001.patch
>
>
> There seems to be an issue with pre-emption when using node labels with queue 
> priority.
> Test configuration:
> queue A (capacity=50, priority=1)
> queue B (capacity=50, priority=2)
> both have accessible-node-labels set to x
> A.accessible-node-labels.x.capacity = 50
> B.accessible-node-labels.x.capacity = 50
> Along with this pre-emption related properties have been set.
> Test steps:
>  - Set NM memory = 6000MB and containerMemory = 750MB
>  - Submit an application A1 to B, with am-container = container = 
> (6000-750-1500), no. of containers = 2
>  - Submit an application A2 to A, with am-container = 750, container = 1500, 
> no of containers = (NUM_NM-1)
>  - Kill application A1
>  - Submit an application A3 to B with am-container=container=5000, no. of 
> containers=3
>  - Expectation is that containers are pre-empted from application A2 to A3 
> but there is no container pre-emption happening
> Container pre-emption is stuck with the message in the RM log,
> {noformat}
> 2018-02-02 11:41:36,974 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,994 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,995 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster={noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8138) No containers pre-empted from another queue when using node labels

2018-04-10 Thread Zian Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433269#comment-16433269
 ] 

Zian Chen commented on YARN-8138:
-

Investigated this issue and wrote a UT to reproduce it. According to the UT. 
the conclusion is the preemption happened after application 3 got submitted.  
But not happening as expected as the test scenario presented. There are several 
issues we need to clarify here.
 # When we set memory size for containers, we need to set them as multiple of 
1024 MB, otherwise, the scheduler will convert them into the nearest size which 
is bigger than the requested size which is multiple of 1024 MB. For example 
app3 had am container request of 750MB, instead, it will get 1024 MB as the 
container size.
 # According to the log, preemption seems not happened. but actually, it 
happened with a long time delay(1 minute probably), the reason is when we set 
"yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.reserved-container-delay-ms"
 property, the reserved container will not be allocated before we hit this 
timeout, which leads preemption will delay more before we hit this timeout.
 # Although we got preemption happened, we will not expect A3 to be able to 
launch all its requested containers. Because the amount of resource A3 can get 
should limit by minimum guaranteed resource for the queue the application 
submitted to. In this case, we will only expect two containers to preempt since 
Queue B will reach its minimum guaranteed resource (50% of the cluster 
resource) after two containers preempt from Queue A.

So my suggestion is recheck the test scenario with those issues mentioned above 
and change settings properly, and the test should pass.

 

[~leftnoteasy] , could you share your opinions as well? Thanks

> No containers pre-empted from another queue when using node labels
> --
>
> Key: YARN-8138
> URL: https://issues.apache.org/jira/browse/YARN-8138
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Zian Chen
>Priority: Blocker
>
> There seems to be an issue with pre-emption when using node labels with queue 
> priority.
> Test configuration:
> queue A (capacity=50, priority=1)
> queue B (capacity=50, priority=2)
> both have accessible-node-labels set to x
> A.accessible-node-labels.x.capacity = 50
> B.accessible-node-labels.x.capacity = 50
> Along with this pre-emption related properties have been set.
> Test steps:
>  - Set NM memory = 6000MB and containerMemory = 750MB
>  - Submit an application A1 to B, with am-container = container = 
> (6000-750-1500), no. of containers = 2
>  - Submit an application A2 to A, with am-container = 750, container = 1500, 
> no of containers = (NUM_NM-1)
>  - Kill application A1
>  - Submit an application A3 to B with am-container=container=5000, no. of 
> containers=3
>  - Expectation is that containers are pre-empted from application A2 to A3 
> but there is no container pre-emption happening
> Container pre-emption is stuck with the message in the RM log,
> {noformat}
> 2018-02-02 11:41:36,974 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
> fulfill reservation for application application_1517571510094_0003 on node: 
> XX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
> Reserved container application=application_1517571510094_0003 
> resource= 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
>  cluster=
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
> 2018-02-02 11:41:36,994 INFO 

[jira] [Updated] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-10 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8142:
-
Target Version/s: 3.2.0, 3.1.1

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Billie Rinaldi
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8060) Create default readiness check for service components

2018-04-10 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8060:
-
Target Version/s: 3.2.0, 3.1.1  (was: 3.2.0)

> Create default readiness check for service components
> -
>
> Key: YARN-8060
> URL: https://issues.apache.org/jira/browse/YARN-8060
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8060.1.patch, YARN-8060.2.patch, YARN-8060.3.patch
>
>
> It is currently possible for a component instance to have READY status before 
> the AM retrieves an IP for the container. We should make sure the IP has been 
> retrieved before marking the instance as READY.
> This default probe could also have an option to check for a DNS entry for the 
> instance's hostname if a DNS address is provided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7941) Transitive dependencies for component are not resolved

2018-04-10 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-7941:
-
Target Version/s: 3.2.0, 3.1.1

> Transitive dependencies for component are not resolved 
> ---
>
> Key: YARN-7941
> URL: https://issues.apache.org/jira/browse/YARN-7941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-7941.1.patch
>
>
> It is observed that transitive dependencies are not resolved as a result one 
> of the component is started earlier. 
> Ex : In HBase app, 
> master is independent component, 
> regionserver is depends on master.  
> hbaseclient depends on regionserver, 
> but I always see that HBaseClient is launched before regionserver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433249#comment-16433249
 ] 

Hudson commented on YARN-8116:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13963 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13963/])
YARN-8116. Nodemanager fails with NumberFormatException: For input (wangda: rev 
2bf9cc2c73944c9f7cde56714b8cf6995cfa539b)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java


> Nodemanager fails with NumberFormatException: For input string: ""
> --
>
> Key: YARN-8116
> URL: https://issues.apache.org/jira/browse/YARN-8116
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8116.001.patch, YARN-8116.002.patch
>
>
> Steps followed.
> 1) Update nodemanager debug delay config
> {code}
> 
>   yarn.nodemanager.delete.debug-delay-sec
>   350
> {code}
> 2) Launch distributed shell application multiple times
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn  jar 
> hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 120" 
> -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> hadoop-yarn-applications-distributedshell-*.jar{code}
> 3) restart NM
> Nodemanager fails to start with below error.
> {code}
> {code:title=NM log}
> 2018-03-23 21:32:14,437 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: 
> true
> 2018-03-23 21:32:14,439 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set 
> as 3600. The logs will be aggregated every 3600 seconds
> 2018-03-23 21:32:14,455 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
> 2018-03-23 21:32:14,458 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(148)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2018-03-23 21:32:14,460 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state 
> INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> 

[jira] [Commented] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433250#comment-16433250
 ] 

Hudson commented on YARN-8133:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13963 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13963/])
YARN-8133. Doc link broken for yarn-service from overview page. (Rohith 
(wangda: rev d919eb6efa1072517017c75fb323e391f4418dc8)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Concepts.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/QuickStart.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Overview.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/ServiceDiscovery.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/RegistryDNS.md


> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8133.01.patch, YARN-8133.02.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8143) Improve log message when Capacity Scheduler request allocation on node

2018-04-10 Thread Zian Chen (JIRA)
Zian Chen created YARN-8143:
---

 Summary: Improve log message when Capacity Scheduler request 
allocation on node
 Key: YARN-8143
 URL: https://issues.apache.org/jira/browse/YARN-8143
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Zian Chen
Assignee: Zian Chen


When scheduler request allocates container on the node with reserved containers 
on it, this log message will print very frequently which needs to be improved 
with more condition checks.
{code:java}
2018-02-02 11:41:13,105 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:tryCommit(2673)) - Allocation proposal accepted
2018-02-02 11:41:13,115 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:allocateContainerOnSingleNode(1391)) - Trying to 
fulfill reservation for application application_1517571510094_0003 on node: 
ctr-e137-1514896590304-52728-01-07.hwx.site:25454
2018-02-02 11:41:13,115 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97)) - 
Reserved container  application=application_1517571510094_0003 
resource= 
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
 cluster=
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8133:
-
Fix Version/s: 3.2.0

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8133.01.patch, YARN-8133.02.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-10 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433225#comment-16433225
 ] 

Gour Saha edited comment on YARN-8133 at 4/11/18 12:28 AM:
---

Thanks [~rohithsharma]. 02 patch looks good. +1 for commit.


was (Author: gsaha):
Thanks [~rohithsharma]. 002 patch looks good. +1 for commit.

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-8133.01.patch, YARN-8133.02.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8133) Doc link broken for yarn-service from overview page.

2018-04-10 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433225#comment-16433225
 ] 

Gour Saha commented on YARN-8133:
-

Thanks [~rohithsharma]. 002 patch looks good. +1 for commit.

> Doc link broken for yarn-service from overview page.
> 
>
> Key: YARN-8133
> URL: https://issues.apache.org/jira/browse/YARN-8133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-8133.01.patch, YARN-8133.02.patch
>
>
> I see that documentation link broken from overview page. 
> Any link clicking from 
> http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html
>  page causing an error. 
> It looks like Overview page, redirecting with .md page which doesn't exist. 
> It should redirect to *.html page



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service

2018-04-10 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-8140:
---

Assignee: Eric Yang

> Improve log message when launch cmd is ran for stopped yarn service
> ---
>
> Key: YARN-8140
> URL: https://issues.apache.org/jira/browse/YARN-8140
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
>
> Steps:
>  1) Launch sleeper app
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms
> 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: 
> application_1523387473707_0007
> Exit Code: 0{code}
> 2) Stop the application
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop 
> sleeper2-duplicate-app-stopped
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms
> 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service 
> sleeper2-duplicate-app-stopped
> Exit Code: 0{code}
> 3) Launch the application with same name
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms
> 18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already 
> exists: 
> hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json
> Exit Code: 56
> {code}
>  
> Here, launch cmd fails with "Service Instance dir already exists: 
> hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json".
>  
> The log message should be more meaningful. It should return that 
> "sleeper2-duplicate-app-stopped is in stopped state".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional 

[jira] [Comment Edited] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433210#comment-16433210
 ] 

Eric Yang edited comment on YARN-8142 at 4/11/18 12:07 AM:
---

In Unix terms, SIGTERM is used for terminating application.  My impression this 
is correct behavior rather than start another instance.  If other signal is 
used (besides SIGKILL and SIGTERM), then spawning another instance might be the 
right thing to do.


was (Author: eyang):
In Unix terms, SIGTERM is used for terminating application.  My impression this 
is correct behavior rather than start another instance.  If other signal is 
used, then spawning another instance might be the right thing to do.

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Billie Rinaldi
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433210#comment-16433210
 ] 

Eric Yang commented on YARN-8142:
-

In Unix terms, SIGTERM is used for terminating application.  My impression this 
is correct behavior rather than start another instance.  If other signal is 
used, then spawning another instance might be the right thing to do.

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Billie Rinaldi
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8126) [Follow up] Support auto-spawning of admin configured services during bootstrap of rm

2018-04-10 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433208#comment-16433208
 ] 

Gour Saha commented on YARN-8126:
-

I think this deserves to be a new sub-topic "System Services" on the left panel 
under "Service Discovery" (in the "YARN Service" section). It might seem that 
there is not much information for it to go to a new page, but there are 
primarily 2 reasons I am inclining towards it -
 # An ordinary end-user cannot create or add services to be started as 
system-services. So it should not be in the existing pages which focuses on 
what ordinary end-users can do. Hence in this new page, we should specifically 
call out that this is a cluster admin feature.
 # This is a pretty handy feature and going forward this page might grow as we 
add more system-service related features or add helpful system-services to the 
framework itself and would also need documentation to go with it.

What do you think?

> [Follow up] Support auto-spawning of admin configured services during 
> bootstrap of rm
> -
>
> Key: YARN-8126
> URL: https://issues.apache.org/jira/browse/YARN-8126
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8126.001.patch
>
>
> YARN-8048 adds support auto-spawning of admin configured services during 
> bootstrap of rm. 
> This JIRA is to follow up some of the comments discussed in YARN-8048. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior

2018-04-10 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433207#comment-16433207
 ] 

Billie Rinaldi commented on YARN-7984:
--

Thanks, [~eyang]! I plan to cherry-pick this to branch-3.1 as well.

> Delete registry entries from ZK on ServiceClient stop and clean up 
> stop/destroy behavior
> 
>
> Key: YARN-7984
> URL: https://issues.apache.org/jira/browse/YARN-7984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Fix For: 3.2.0
>
> Attachments: YARN-7984.1.patch, YARN-7984.2.patch
>
>
> The service records written to the registry are removed by ServiceClient on a 
> destroy call, but not on a stop call. The service AM does have some code to 
> clean up the registry entries when component instances are stopped, but if 
> the AM is killed before it has a chance to perform the cleanup, these entries 
> will be left in ZooKeeper. It would be better to clean these up in the stop 
> call, so that RegistryDNS does not provide lookups for containers that don't 
> exist.
> Additional stop/destroy behavior improvements include fixing errors / 
> unexpected behavior related to:
> * destroying a saved (not launched or started) service
> * destroying a stopped service
> * destroying a destroyed service
> * returning proper exit codes for destroy failures
> * performing other client operations on saved services (fixing NPEs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8104) Add API to fetch node to attribute mapping

2018-04-10 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433193#comment-16433193
 ] 

Naganarasimha G R commented on YARN-8104:
-

Thanks for the patch [~bibinchundatt]

At high level
 * Can you inform why NodeToAttributesProto is moved from 
yarn_server_resourcemanager_service_protos.proto to yarn_protos.proto ?
 * Also would it make sense to provide overloaded method(getNodesToAttributes) 
here supporting for NodeID ?

and few other comments :

yarn_protos.protos
 * ln no 391: node => hostname ... similar to earlier naming convention

yarn_service_protos.protos
 * ln no 282: nodeToAttributes => nodesToAttributes ... based on convention 
followed in other places

GetNodesToAttributesRequest.java
 * line no 55 & 64 : setNodes & getNodes -> setHostNames & getHostNames

GetNodesToAttributesRequestPBImpl 
 * line no 120: initNodeAttributes => initNodesToAttributesRequest or just init
 * line no 126: nodeLabelsList => hostNamesList


TestPBImplRecords
 * We need to invoke generateByNewInstance for all the new PB's in setup. can 
you please check.

NodeAttributesManagerImpl
 * ln no 454-457: Here we do not have a mapping we are setting a hostname with 
empty set, is that better or just pass for the ones which have attributes is 
better?
IMO for the ones having mapping is better so that we not bloating the response 
and its a map. Also if nonexistent of erroneous hostnames are given it still 
shows empty set


TestClientRMService
 * ln no 2053: can we have a separate method to test node to attributes API ? 
or document what all api's will be tested. and rename it to 
testNodeAttributesQueryAPI... i would still prefer the former option itself 
though..

 

Can you also check the new findbug issue reported, checktyle  and as well as 
the javadoc issues reported, as it seems related to patch and fixable ?

> Add API to fetch node to attribute mapping
> --
>
> Key: YARN-8104
> URL: https://issues.apache.org/jira/browse/YARN-8104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8104-YARN-3409.001.patch, 
> YARN-8104-YARN-3409.002.patch, YARN-8104-YARN-3409.003.patch
>
>
> Add node/host to attribute mapping in yarn client API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-04-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433188#comment-16433188
 ] 

Hudson commented on YARN-7973:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13962 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13962/])
YARN-7973. Added ContainerRelaunch feature for Docker containers.
(eyang: rev c467f311d0c7155c09052d93fac12045af925583)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerCommandExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerRelaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitorResourceChange.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerStartCommand.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DefaultLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerStartCommand.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DelegatingLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.h
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerRelaunch.java


> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: 

[jira] [Commented] (YARN-7781) Update YARN-Services-Examples.md to be in sync with the latest code

2018-04-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433185#comment-16433185
 ] 

Jian He commented on YARN-7781:
---

sure, go ahead. thanks

> Update YARN-Services-Examples.md to be in sync with the latest code
> ---
>
> Key: YARN-7781
> URL: https://issues.apache.org/jira/browse/YARN-7781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Jian He
>Priority: Major
> Attachments: YARN-7781.01.patch, YARN-7781.02.patch, 
> YARN-7781.03.patch
>
>
> Update YARN-Services-Examples.md to make the following additions/changes:
> 1. Add an additional URL and PUT Request JSON to support flex:
> Update to flex up/down the no of containers (instances) of a component of a 
> service
> PUT URL – http://localhost:8088/app/v1/services/hello-world
> PUT Request JSON
> {code}
> {
>   "components" : [ {
> "name" : "hello",
> "number_of_containers" : 3
>   } ]
> }
> {code}
> 2. Modify all occurrences of /ws/ to /app/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-10 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi reassigned YARN-8142:


Assignee: Billie Rinaldi

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Billie Rinaldi
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior

2018-04-10 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7984:

Fix Version/s: 3.2.0

> Delete registry entries from ZK on ServiceClient stop and clean up 
> stop/destroy behavior
> 
>
> Key: YARN-7984
> URL: https://issues.apache.org/jira/browse/YARN-7984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Fix For: 3.2.0
>
> Attachments: YARN-7984.1.patch, YARN-7984.2.patch
>
>
> The service records written to the registry are removed by ServiceClient on a 
> destroy call, but not on a stop call. The service AM does have some code to 
> clean up the registry entries when component instances are stopped, but if 
> the AM is killed before it has a chance to perform the cleanup, these entries 
> will be left in ZooKeeper. It would be better to clean these up in the stop 
> call, so that RegistryDNS does not provide lookups for containers that don't 
> exist.
> Additional stop/destroy behavior improvements include fixing errors / 
> unexpected behavior related to:
> * destroying a saved (not launched or started) service
> * destroying a stopped service
> * destroying a destroyed service
> * returning proper exit codes for destroy failures
> * performing other client operations on saved services (fixing NPEs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-04-10 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7973:

Fix Version/s: 3.2.0

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch, 
> YARN-7973.003.patch, YARN-7973.004.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433162#comment-16433162
 ] 

Eric Yang commented on YARN-7221:
-

TestContainerSchedulerQueuing unit test failure is not related to changes in 
this patch.

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, 
> YARN-7221.021.patch, YARN-7221.022.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early

2018-04-10 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433137#comment-16433137
 ] 

Eric Badger commented on YARN-7189:
---

Attaching first patch to fix this issue. There is a race in the removal of the 
docker container where the pid may not be valid anymore (no such process), but 
the docker container is still in the running state. Because of that, I have 
added an exponential backoff of removal in this patch. It will try for 5 
iterations of increasing sleep times and eventually give up after the last one. 

> Container-executor doesn't remove Docker containers that error out early
> 
>
> Key: YARN-7189
> URL: https://issues.apache.org/jira/browse/YARN-7189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.9.0, 2.8.3, 3.0.1
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7189-b3.0.001.patch
>
>
> Once the docker run command is executed, the docker container is created 
> unless the return code is 125 meaning that the run command itself failed 
> (https://docs.docker.com/engine/reference/run/#exit-status). Any error that 
> happens after the docker run needs to remove the container during cleanup.
> {noformat:title=container-executor.c:launch_docker_container_as_user}
>   snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, 
> docker_command);
>   fprintf(LOGFILE, "Launching docker container...\n");
>   FILE* start_docker = popen(docker_command_with_binary, "r");
> {noformat}
> This is fixed by YARN-5366, which changes how we remove containers. However, 
> that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7189) Container-executor doesn't remove Docker containers that error out early

2018-04-10 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7189:
--
Attachment: YARN-7189-b3.0.001.patch

> Container-executor doesn't remove Docker containers that error out early
> 
>
> Key: YARN-7189
> URL: https://issues.apache.org/jira/browse/YARN-7189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.9.0, 2.8.3, 3.0.1
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-7189-b3.0.001.patch
>
>
> Once the docker run command is executed, the docker container is created 
> unless the return code is 125 meaning that the run command itself failed 
> (https://docs.docker.com/engine/reference/run/#exit-status). Any error that 
> happens after the docker run needs to remove the container during cleanup.
> {noformat:title=container-executor.c:launch_docker_container_as_user}
>   snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, 
> docker_command);
>   fprintf(LOGFILE, "Launching docker container...\n");
>   FILE* start_docker = popen(docker_command_with_binary, "r");
> {noformat}
> This is fixed by YARN-5366, which changes how we remove containers. However, 
> that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8142) yarn service application stops when AM is killed with SIGTERM

2018-04-10 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8142:
-
Summary: yarn service application stops when AM is killed with SIGTERM  
(was: yarn service application stops when AM is killed)

> yarn service application stops when AM is killed with SIGTERM
> -
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> 18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms
> 18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
> application_1522887500374_0010
> Exit Code: 0{code}
> 2) Wait for sleeper component to be up
> 3) Kill AM process PID
>  
> Expected behavior:
> New attempt of AM will be started. The pre-existing container will keep 
> running
>  
> Actual behavior:
> Application finishes with State : FINISHED and Final-State : ENDED
> New attempt was never launched
> Note: 
> when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting 
> the entire app down instead of letting it continue to run for another attempt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8142) yarn service application stops when AM is killed

2018-04-10 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8142:


 Summary: yarn service application stops when AM is killed
 Key: YARN-8142
 URL: https://issues.apache.org/jira/browse/YARN-8142
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Yesha Vora


Steps:

1) Launch sleeper job ( non-docker yarn service)

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
fault-test-am-sleeper 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms

18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
application_1522887500374_0010

Exit Code: 0\{code}

2) Wait for sleeper component to be up

3) Kill AM process PID

 

Expected behavior:

New attempt of AM will be started. The pre-existing container will keep running

 

Actual behavior:

Application finishes with State : FINISHED and Final-State : ENDED

New attempt was never launched

Note: 

when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting the 
entire app down instead of letting it continue to run for another attempt

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8142) yarn service application stops when AM is killed

2018-04-10 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8142:
-
Description: 
Steps:

1) Launch sleeper job ( non-docker yarn service)

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
fault-test-am-sleeper 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms

18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
application_1522887500374_0010

Exit Code: 0{code}

2) Wait for sleeper component to be up

3) Kill AM process PID

 

Expected behavior:

New attempt of AM will be started. The pre-existing container will keep running

 

Actual behavior:

Application finishes with State : FINISHED and Final-State : ENDED

New attempt was never launched

Note: 

when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting the 
entire app down instead of letting it continue to run for another attempt

 

  was:
Steps:

1) Launch sleeper job ( non-docker yarn service)

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
fault-test-am-sleeper 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
server at xxx:10200

18/04/06 22:24:24 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json

18/04/06 22:24:26 INFO util.log: Logging initialized @3631ms

18/04/06 22:24:37 INFO client.ApiServiceClient: Application ID: 
application_1522887500374_0010

Exit Code: 0\{code}

2) Wait for sleeper component to be up

3) Kill AM process PID

 

Expected behavior:

New attempt of AM will be started. The pre-existing container will keep running

 

Actual behavior:

Application finishes with State : FINISHED and Final-State : ENDED

New attempt was never launched

Note: 

when the AM gets a SIGTERM and gracefully shuts itself down. It is shutting the 
entire app down instead of letting it continue to run for another attempt

 


> yarn service application stops when AM is killed
> 
>
> Key: YARN-8142
> URL: https://issues.apache.org/jira/browse/YARN-8142
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Priority: Major
>
> Steps:
> 1) Launch sleeper job ( non-docker yarn service)
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> fault-test-am-sleeper 
> /usr/hdp/current/hadoop-yarn-client/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/06 22:24:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> server at xxx:10200
> 18/04/06 22:24:24 INFO client.AHSProxy: Connecting to Application History 
> 

[jira] [Commented] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service

2018-04-10 Thread Yesha Vora (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433093#comment-16433093
 ] 

Yesha Vora commented on YARN-8140:
--

yes [~eyang] this message sounds good.

> Improve log message when launch cmd is ran for stopped yarn service
> ---
>
> Key: YARN-8140
> URL: https://issues.apache.org/jira/browse/YARN-8140
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Priority: Major
>
> Steps:
>  1) Launch sleeper app
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms
> 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: 
> application_1523387473707_0007
> Exit Code: 0{code}
> 2) Stop the application
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop 
> sleeper2-duplicate-app-stopped
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms
> 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service 
> sleeper2-duplicate-app-stopped
> Exit Code: 0{code}
> 3) Launch the application with same name
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms
> 18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already 
> exists: 
> hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json
> Exit Code: 56
> {code}
>  
> Here, launch cmd fails with "Service Instance dir already exists: 
> hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json".
>  
> The log message should be more meaningful. It should return that 
> "sleeper2-duplicate-app-stopped is in stopped state".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-04-10 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433090#comment-16433090
 ] 

Eric Payne commented on YARN-4781:
--

bq. Hence in case we use sizeBasedWeight, we are considering pending as well. 
So i had this doubt..
I see what you mean. Good catch. I was not considering the {{sizeBasedWeight}} 
case. My first thought was to just use the 
{{FairOrderingPolicy#FairComparator}}, but that is for {{SchedulableEntity}}s 
like {{FiCaschedulingApp}}, and the {{PriorityQueue}}s in 
{{FifoIntraQueuePreemptionPlugin}} are sorting {{TempAppPerPartition}}s, so I 
wouldn't be able to combine this feature with the 
{{FifoIntraQueuePreemptionPlugin}}.

It may be worthwhile to go back to your previous suggestion about splitting out 
common functionality into an abstract {{AbstractIntraQueuePreemptionPlugin}} 
class and sub-classing FiFo and Fair puligins.

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433083#comment-16433083
 ] 

Eric Yang commented on YARN-8140:
-

[~yeshavora] This is because the application has not been destroyed.  The 
subsequence launch failed because the duplicate service name is in use.  I 
think the message is not wrong to indicate to the user that hadoop can't deploy 
with the same name.  If the error message is returned with 
sleeper2-duplicate-app-stopped is in stopped state.  User may be mistaken that 
second service of the same name is persisted in Hadoop.  This is not the case, 
therefore, existing message is more concise in delivering the message.  We can 
change the message to "Service name sleeper2-duplicate-app-stopped is already 
taken: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json".
  Will this work?

> Improve log message when launch cmd is ran for stopped yarn service
> ---
>
> Key: YARN-8140
> URL: https://issues.apache.org/jira/browse/YARN-8140
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Priority: Major
>
> Steps:
>  1) Launch sleeper app
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms
> 18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: 
> application_1523387473707_0007
> Exit Code: 0{code}
> 2) Stop the application
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop 
> sleeper2-duplicate-app-stopped
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms
> 18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service 
> sleeper2-duplicate-app-stopped
> Exit Code: 0{code}
> 3) Launch the application with same name
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
> sleeper2-duplicate-app-stopped 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
> server at xx:10200
> 18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition 
> from local FS: 
> /usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json
> 18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms
> 18/04/10 21:31:22 ERROR 

[jira] [Comment Edited] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-04-10 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433081#comment-16433081
 ] 

Shane Kumpf edited comment on YARN-8141 at 4/10/18 10:19 PM:
-

Thanks for reporting this, [~leftnoteasy] - 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} is intended to be used for the purpose 
you call out. When that variable was added, 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} was retained due to its 
existing use in native services. Is there a case where 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} won't work for your need? Maybe it is 
time we look to consolidate these two.


was (Author: shaneku...@gmail.com):
Thanks for reporting this, [~leftnoteasy] - 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} is intended to be used for the purpose 
you call out. When that variable was added, 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} was retained due to its 
existing use in native services. Is there a case where 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} won't work for your need? Maybe it is 
time we do look to consolidate these two.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Priority: Critical
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-04-10 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433081#comment-16433081
 ] 

Shane Kumpf commented on YARN-8141:
---

Thanks for reporting this, [~leftnoteasy] - 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} is intended to be used for the purpose 
you call out. When that variable was added, 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} was retained due to its 
existing use in native services. Is there a case where 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} won't work for your need? Maybe it is 
time we do look to consolidate these two.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Priority: Critical
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433080#comment-16433080
 ] 

genericqa commented on YARN-7221:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
1s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 53s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7221 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918457/YARN-7221.022.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 1152cddcffed 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8ab776d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20293/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20293/testReport/ |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433071#comment-16433071
 ] 

Eric Yang commented on YARN-7530:
-

[~leftnoteasy] YARN service have dependency setup backward because precursor 
SLIDER was designed to run in YARN.  Instead of server dependent on client.  
ServiceClient depends on hadoop-yarn-service-core, and 
hadoop-yarn-server-common.  Therefore, it might be problematic to move 
hadoop-yarn-services-core to yarn common.  You are welcome to try, but it would 
be good to keep some of Yarn Service Application Master as a piece that is 
build after yarn client + yarn servers to avoid circular dependencies.

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: yarn-native-services
>
> Attachments: YARN-7530.001.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration

2018-04-10 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433067#comment-16433067
 ] 

Jonathan Hung commented on YARN-7974:
-

Thanks for the comments,

For 1 I think this is possible, initially I didn't want to add more APIs to 
ApplicationMasterProtocol, but I think we can add a field in 
AllocateRequestProto, and just update the url on next call to allocate() to 
avoid overcomplicating the protocol.

For 2 I will make this change.

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol.
> We'll post the patch soon, assuming there are no issues with this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior

2018-04-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433062#comment-16433062
 ] 

Hudson commented on YARN-7984:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13959 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13959/])
YARN-7984. Improved YARN service stop/destroy and clean up.(eyang: 
rev d553799030a5a64df328319aceb35734d0b2de20)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/test/java/org/apache/hadoop/yarn/service/ServiceClientTest.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestYarnNativeServices.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/client/ServiceClient.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/main/java/org/apache/hadoop/yarn/service/webapp/ApiServer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/ServiceTestUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services-api/src/test/java/org/apache/hadoop/yarn/service/TestApiServer.java


> Delete registry entries from ZK on ServiceClient stop and clean up 
> stop/destroy behavior
> 
>
> Key: YARN-7984
> URL: https://issues.apache.org/jira/browse/YARN-7984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7984.1.patch, YARN-7984.2.patch
>
>
> The service records written to the registry are removed by ServiceClient on a 
> destroy call, but not on a stop call. The service AM does have some code to 
> clean up the registry entries when component instances are stopped, but if 
> the AM is killed before it has a chance to perform the cleanup, these entries 
> will be left in ZooKeeper. It would be better to clean these up in the stop 
> call, so that RegistryDNS does not provide lookups for containers that don't 
> exist.
> Additional stop/destroy behavior improvements include fixing errors / 
> unexpected behavior related to:
> * destroying a saved (not launched or started) service
> * destroying a stopped service
> * destroying a destroyed service
> * returning proper exit codes for destroy failures
> * performing other client operations on saved services (fixing NPEs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-04-10 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8141:


 Summary: YARN Native Service: Respect 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
 Key: YARN-8141
 URL: https://issues.apache.org/jira/browse/YARN-8141
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Reporter: Wangda Tan


Existing YARN native service overwrites 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
specified this in service spec or not. It is important to allow user to mount 
local folders like /etc/passwd, etc.

Following logic overwrites the 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
{code:java}
StringBuilder sb = new StringBuilder();
for (Entry mount : mountPaths.entrySet()) {
  if (sb.length() > 0) {
sb.append(",");
  }
  sb.append(mount.getKey());
  sb.append(":");
  sb.append(mount.getValue());
}
env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
sb.toString());{code}
Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8037) CGroupsResourceCalculator logs excessive warnings on container relaunch

2018-04-10 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433049#comment-16433049
 ] 

Shane Kumpf commented on YARN-8037:
---

Thanks [~miklos.szeg...@cloudera.com] - For most applications I only see a 
single exception for each of the subsystems, like the output above, so I'm not 
sure that will address a bulk of these. I have a few ideas to test out and I'll 
report back soon with more detail.

> CGroupsResourceCalculator logs excessive warnings on container relaunch
> ---
>
> Key: YARN-8037
> URL: https://issues.apache.org/jira/browse/YARN-8037
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Major
>
> When a container is relaunched, the old process no longer exists. When using 
> the {{CGroupsResourceCalculator}} this results in the warning and exception 
> below being logged every second until the relaunch occurs, which is excessive 
> and filling up the logs.
> {code:java}
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse 12844
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_02/cpuacct.stat
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more
> 2018-03-16 14:30:33,438 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator:
>  Failed to parse cgroups 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.memsw.usage_in_bytes
> org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the 
> interim 12844
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457)
> Caused by: java.io.FileNotFoundException: 
> /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_02/memory.usage_in_bytes
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320)
> ... 4 more{code}
> We should consider moving the exception to debug to reduce the noise at a 
> minimum. Alternatively, it may make sense to stop the existing 
> {{MonitoringThread}} during relaunch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service

2018-04-10 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8140:
-
Description: 
Steps:

 1) Launch sleeper app

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms

18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: 
application_1523387473707_0007

Exit Code: 0{code}

2) Stop the application

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop 
sleeper2-duplicate-app-stopped

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms

18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service 
sleeper2-duplicate-app-stopped

Exit Code: 0{code}

3) Launch the application with same name

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms

18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already 
exists: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json

Exit Code: 56
{code}

 

Here, launch cmd fails with "Service Instance dir already exists: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json".

 

The log message should be more meaningful. It should return that 
"sleeper2-duplicate-app-stopped is in stopped state".

  was:
Steps:

 1) Launch sleeper app

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200


[jira] [Created] (YARN-8140) Improve log message when launch cmd is ran for stopped yarn service

2018-04-10 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8140:


 Summary: Improve log message when launch cmd is ran for stopped 
yarn service
 Key: YARN-8140
 URL: https://issues.apache.org/jira/browse/YARN-8140
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Yesha Vora


Steps:

 1) Launch sleeper app

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:01 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:01 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

18/04/10 21:31:03 INFO util.log: Logging initialized @2818ms

18/04/10 21:31:10 INFO client.ApiServiceClient: Application ID: 
application_1523387473707_0007

Exit Code: 0\{code}

2) Stop the application

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -stop 
sleeper2-duplicate-app-stopped

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:15 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:16 INFO util.log: Logging initialized @3034ms

18/04/10 21:31:17 INFO client.ApiServiceClient: Successfully stopped service 
sleeper2-duplicate-app-stopped

Exit Code: 0\{code}

3) Launch the application with same name

{code}

RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn app -launch 
sleeper2-duplicate-app-stopped 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
YARN_LOG_DIR.

WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
YARN_LOGFILE.

WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
YARN_PID_DIR.

WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.

18/04/10 21:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:19 INFO client.AHSProxy: Connecting to Application History 
server at xx:10200

18/04/10 21:31:19 INFO client.ApiServiceClient: Loading service definition from 
local FS: 
/usr/hdp/3.0.0.0-xxx/hadoop-yarn/yarn-service-examples/sleeper/sleeper.json

18/04/10 21:31:22 INFO util.log: Logging initialized @4456ms

18/04/10 21:31:22 ERROR client.ApiServiceClient: Service Instance dir already 
exists: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json

Exit Code: 56

{code}

 

Here, launch cmd fails with "Service Instance dir already exists: 
hdfs://mycluster/user/hrt_qa/.yarn/services/sleeper2-duplicate-app-stopped/sleeper2-duplicate-app-stopped.json".

 

The log message should be more meaningful. It should return that 
"sleeper2-duplicate-app-stopped is in stopped state".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7984) Delete registry entries from ZK on ServiceClient stop and clean up stop/destroy behavior

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433038#comment-16433038
 ] 

Eric Yang commented on YARN-7984:
-

[~billie.rinaldi] +1 to commit. Patch 2 looks good to me.  Stop and destroy 
command works better with the new error handling.

> Delete registry entries from ZK on ServiceClient stop and clean up 
> stop/destroy behavior
> 
>
> Key: YARN-7984
> URL: https://issues.apache.org/jira/browse/YARN-7984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7984.1.patch, YARN-7984.2.patch
>
>
> The service records written to the registry are removed by ServiceClient on a 
> destroy call, but not on a stop call. The service AM does have some code to 
> clean up the registry entries when component instances are stopped, but if 
> the AM is killed before it has a chance to perform the cleanup, these entries 
> will be left in ZooKeeper. It would be better to clean these up in the stop 
> call, so that RegistryDNS does not provide lookups for containers that don't 
> exist.
> Additional stop/destroy behavior improvements include fixing errors / 
> unexpected behavior related to:
> * destroying a saved (not launched or started) service
> * destroying a stopped service
> * destroying a destroyed service
> * returning proper exit codes for destroy failures
> * performing other client operations on saved services (fixing NPEs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6925) FSSchedulerNode could be simplified extracting preemption fields into a class

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-6925:
--

Assignee: (was: Yufei Gu)

> FSSchedulerNode could be simplified extracting preemption fields into a class
> -
>
> Key: YARN-6925
> URL: https://issues.apache.org/jira/browse/YARN-6925
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Miklos Szegedi
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6971) Clean up different ways to create resources

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-6971:
--

Assignee: (was: Yufei Gu)

> Clean up different ways to create resources
> ---
>
> Key: YARN-6971
> URL: https://issues.apache.org/jira/browse/YARN-6971
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Yufei Gu
>Priority: Minor
>  Labels: newbie
>
> There are several ways to create a {{resource}} object, e.g., 
> BuilderUtils.newResource() and Resources.createResource(). These methods not 
> only cause confusing but also performance issues, for example 
> BuilderUtils.newResource() is significant slow than 
> Resources.createResource(). 
> We could merge them some how, and replace most BuilderUtils.newResource() 
> with Resources.createResource().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6941) Allow Queue placement policies to be ordered by attribute

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-6941:
--

Assignee: (was: Yufei Gu)

> Allow Queue placement policies to be ordered by attribute
> -
>
> Key: YARN-6941
> URL: https://issues.apache.org/jira/browse/YARN-6941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Yufei Gu
>Priority: Minor
>
> It would be nice to add a feature that would allow users to provide an 
> "order" or "index" the placement policies should apply, rather than just the 
> native policy order as included in the XML.
> For instance, the following two examples would be the same:
> Natural order:
> 
> 
> 
> 
> 
> Indexed Order:
> 
> 
> 
> 
> 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-3797) NodeManager not blacklisting the disk (shuffle) with errors

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-3797:
--

Assignee: (was: Yufei Gu)

> NodeManager not blacklisting the disk (shuffle) with errors
> ---
>
> Key: YARN-3797
> URL: https://issues.apache.org/jira/browse/YARN-3797
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Rajesh Balamohan
>Priority: Major
>
> In a multi-node environment, one of the disk (where map outputs are written) 
> in a node went bad. Errors are given below.
> {noformat}
> Info fld=0x9ad090a
> sd 6:0:5:0: [sdf]  Add. Sense: Unrecovered read error
> sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 ad 09 08 00 00 08 00
> end_request: critical medium error, dev sdf, sector 162334984
> mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
> sd 6:0:5:0: [sdf]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 6:0:5:0: [sdf]  Sense Key : Medium Error [current]
> Info fld=0x9af8892
> sd 6:0:5:0: [sdf]  Add. Sense: Unrecovered read error
> sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 af 88 90 00 00 08 00
> end_request: critical medium error, dev sdf, sector 162498704
> mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
> mpt2sas0: log_info(0x3108): originator(PL), code(0x08), sub_code(0x)
> sd 6:0:5:0: [sdf]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 6:0:5:0: [sdf]  Sense Key : Medium Error [current]
> Info fld=0x9af8892
> sd 6:0:5:0: [sdf]  Add. Sense: Unrecovered read error
> sd 6:0:5:0: [sdf] CDB: Read(10): 28 00 09 af 88 90 00 00 08 00
> end_request: critical medium error, dev sdf, sector 162498704
> {noformat}
> Diskchecker would pass as the system allows to create directories and delete 
> directories without issues.  But data being served out can be corrupt and 
> fetchers fail during CRC verification with unwanted delays and retries. 
> Ideally node manager should detect such errors and blacklist/remove those 
> disks from NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-3890) FairScheduler should show the scheduler health metrics similar to ones added in CapacityScheduler

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-3890:
--

Assignee: Gergo Repas  (was: Yufei Gu)

> FairScheduler should show the scheduler health metrics similar to ones added 
> in CapacityScheduler
> -
>
> Key: YARN-3890
> URL: https://issues.apache.org/jira/browse/YARN-3890
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Gergo Repas
>Priority: Major
>
> We should add information displayed in YARN-3293 in FairScheduler as well 
> possibly sharing the implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5824) Verify app starvation under custom preemption thresholds and timeouts

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-5824:
--

Assignee: (was: Yufei Gu)

> Verify app starvation under custom preemption thresholds and timeouts
> -
>
> Key: YARN-5824
> URL: https://issues.apache.org/jira/browse/YARN-5824
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Priority: Major
>
> YARN-5783 adds basic tests to verify applications are identified to be 
> starved. This JIRA is to add more advanced tests for different values of 
> preemption thresholds and timeouts. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6324) The log4j.properties in sample-conf doesn't work well for SLS

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-6324:
--

Assignee: (was: Yufei Gu)

> The log4j.properties in sample-conf doesn't work well for SLS
> -
>
> Key: YARN-6324
> URL: https://issues.apache.org/jira/browse/YARN-6324
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Yufei Gu
>Priority: Major
>
> Many log messages are missing, such as no way to find RM logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7347) Fixe the bug in Fair scheduler to handle a queue named "root.root"

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-7347:
--

Assignee: Gergo Repas  (was: Yufei Gu)

> Fixe the bug in Fair scheduler to handle a queue named "root.root"
> --
>
> Key: YARN-7347
> URL: https://issues.apache.org/jira/browse/YARN-7347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Reporter: Yufei Gu
>Assignee: Gergo Repas
>Priority: Major
>
> A queue named "root.root" may cause issue in Fair scheduler. For example, if 
> we set the queue(root.root) to be reservable, then submit a job into the 
> queue. We got following error.
> {code}
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1508176133973_0002 to YARN : root.root is not a leaf 
> queue
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:339)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:253)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:307)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1508176133973_0002 to YARN : root.root is not a leaf queue
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:293)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:298)
>   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:324)
>   ... 25 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7948) Enable refreshing maximum allocation for multiple resource types

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-7948:
--

Assignee: Szilard Nemeth  (was: Yufei Gu)

> Enable refreshing maximum allocation for multiple resource types
> 
>
> Key: YARN-7948
> URL: https://issues.apache.org/jira/browse/YARN-7948
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0
>Reporter: Yufei Gu
>Assignee: Szilard Nemeth
>Priority: Major
>
> YARN-7738 did the same thing for CS. We need a fix for FS. We could fix it by 
> moving the refresh code from class CS to class AbstractYARNScheduler. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7968) Reset the queue name in submission context while recovering an application

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu resolved YARN-7968.

Resolution: Won't Fix

> Reset the queue name in submission context while recovering an application
> --
>
> Key: YARN-7968
> URL: https://issues.apache.org/jira/browse/YARN-7968
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Major
>
> After YARN-7139, the new application can get correct queue name in its 
> submission context. We need to do the same thing for application recovering. 
> {code}
>   if (isAppRecovering) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug(applicationId
>   + " is recovering. Skip notifying APP_ACCEPTED");
> }
>   } else {
> // During tests we do not always have an application object, handle
> // it here but we probably should fix the tests
> if (rmApp != null && rmApp.getApplicationSubmissionContext() != null) 
> {
>   // Before we send out the event that the app is accepted is
>   // to set the queue in the submissionContext (needed on restore etc)
>   rmApp.getApplicationSubmissionContext().setQueue(queue.getName());
> }
> rmContext.getDispatcher().getEventHandler().handle(
> new RMAppEvent(applicationId, RMAppEventType.APP_ACCEPTED));
>   }
> {code}
> We can do it by moving the 
> {{rmApp.getApplicationSubmissionContext().setQueue}} block out of the if-else 
> block. cc [~wilfreds].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7263) Check host name resolution performance when resource manager starts up

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-7263:
--

Assignee: (was: Yufei Gu)

> Check host name resolution performance when resource manager starts up
> --
>
> Key: YARN-7263
> URL: https://issues.apache.org/jira/browse/YARN-7263
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Priority: Major
>
> According to YARN-7207, host name resolution could be slow in some 
> environment, which affects RM performance in different ways. It would be nice 
> to check that when RM starts up and place a warning message into the logs if 
> the performance is not ideal. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6527) Provide a better out-of-the-box experience for SLS

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-6527:
--

Assignee: (was: Yufei Gu)

> Provide a better out-of-the-box experience for SLS
> --
>
> Key: YARN-6527
> URL: https://issues.apache.org/jira/browse/YARN-6527
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0-alpha4
>Reporter: Robert Kanter
>Priority: Major
>
> The example provided with SLS appears to be broken - I didn't see any jobs 
> running.  On top of that, it seems like getting SLS to run properly requires 
> a lot of hadoop site configs, scheduler configs, etc.  I was only able to get 
> something running after [~yufeigu] provided a lot of config files.
> We should provide a better out-of-the-box experience for SLS.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8061) An application may preempt itself in case of minshare preemption

2018-04-10 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-8061:
--

Assignee: (was: Yufei Gu)

> An application may preempt itself in case of minshare preemption
> 
>
> Key: YARN-8061
> URL: https://issues.apache.org/jira/browse/YARN-8061
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Yufei Gu
>Priority: Major
>
> Assume a leaf queue A's minshare is 10G memory and fairshare is 12G. It used 
> 4G, so its minshare-staved resources is 6G and will be distributed to all its 
> apps. Assume there are 4 apps a1, a2, a3, a4 inside, who demand 3G, 2G, 1G, 
> and 0.5G. a1 gets 3G minshare-starved resources, a2 gets 2G, a3 get 1G, they 
> are all considered as starved apps except a4 who doesn't get any. 
> An app can preempt another under the same queue due to minshare starvation. 
> For example, a1 can preempt a4 if a4 uses more resources than its fair share, 
> which is 3G(12G/4). If a1 itself used more than 3G memory, it will preempt 
> itself! I will create a unit test later. 
> The solution would check application's fair share while distributing minshare 
> starvation, more details in method 
> {{FSLeafQueue#updateStarvedAppsMinshare()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement

2018-04-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432970#comment-16432970
 ] 

Wangda Tan commented on YARN-7494:
--

Thanks [~sunilg], 

In general change looks good. Could u check UT failures?

[~cheersyang] please commit the patch once you think it is ready.

> Add muti node lookup support for better placement
> -
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.v0.patch, YARN-7494.v1.patch, 
> multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8116) Nodemanager fails with NumberFormatException: For input string: ""

2018-04-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432938#comment-16432938
 ] 

Wangda Tan commented on YARN-8116:
--

+1, thanks [~csingh], will commit shortly.

> Nodemanager fails with NumberFormatException: For input string: ""
> --
>
> Key: YARN-8116
> URL: https://issues.apache.org/jira/browse/YARN-8116
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8116.001.patch, YARN-8116.002.patch
>
>
> Steps followed.
> 1) Update nodemanager debug delay config
> {code}
> 
>   yarn.nodemanager.delete.debug-delay-sec
>   350
> {code}
> 2) Launch distributed shell application multiple times
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn  jar 
> hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 120" 
> -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL=true -jar 
> hadoop-yarn-applications-distributedshell-*.jar{code}
> 3) restart NM
> Nodemanager fails to start with below error.
> {code}
> {code:title=NM log}
> 2018-03-23 21:32:14,437 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:serviceInit(181)) - ContainersMonitor enabled: 
> true
> 2018-03-23 21:32:14,439 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceInit(130)) - rollingMonitorInterval is set 
> as 3600. The logs will be aggregated every 3600 seconds
> 2018-03-23 21:32:14,455 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:899)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:960)
> 2018-03-23 21:32:14,458 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(148)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2018-03-23 21:32:14,460 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state 
> INITED
> java.lang.NumberFormatException: For input string: ""
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:601)
>   at java.lang.Long.parseLong(Long.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState(NMLeveldbStateStoreService.java:350)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState(NMLeveldbStateStoreService.java:253)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:365)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:464)
>   at 
> 

[jira] [Updated] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7221:

Attachment: YARN-7221.022.patch

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, 
> YARN-7221.021.patch, YARN-7221.022.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432936#comment-16432936
 ] 

Eric Yang commented on YARN-7221:
-

Patch 22 rebased to current trunk.

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, 
> YARN-7221.021.patch, YARN-7221.022.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-04-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432934#comment-16432934
 ] 

Wangda Tan commented on YARN-7530:
--

[~eyang], thanks for sharing ur thoughts.

To me, for currently scope of native service, it is already beyond a single / 
self-contained app on YARN:

1) YARN Service API is part of RM. 

2) After YARN-8048, system services can be deployed before running any other 
applications.

I think we should move API / Client code to proper places to avoid load native 
service client / API logics by using reflection.

This doesn't block anything for now, but I think it will be important to clean 
it up to get more contributions from community.

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: yarn-native-services
>
> Attachments: YARN-7530.001.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration

2018-04-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432922#comment-16432922
 ] 

Wangda Tan commented on YARN-7974:
--

[~jhung],

Thanks for working on the feature, I can see it's values. 

For implementation / API:

1) Have you considered only allowing AM to update the tracking URL? Which can 
solve some problems like: a. Need to properly check ACL to make the change. b. 
concurrent write tracking URL causes issue.

2) I think the updated tracking URL need to be persisted as well, otherwise RM 
restart causes update information cleared.

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol.
> We'll post the patch soon, assuming there are no issues with this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration

2018-04-10 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432900#comment-16432900
 ] 

Jonathan Hung commented on YARN-7974:
-

Hi [~wangda] - this is the tracking url change I mentioned during last week's 
meeting. Would appreciate if you could take a look if you have the chance :)

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Currently we added a {{updateTrackingUrl}} API to ApplicationClientProtocol.
> We'll post the patch soon, assuming there are no issues with this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications

2018-04-10 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432723#comment-16432723
 ] 

Vrushali C commented on YARN-8130:
--

Yes, I agree, we need a configurable delay like the collectorLingerPeriod in 
the PerNodeTimelineCollectorsAuxService#removeApplicationCollector. 

Need to check if there are other places where we are removing the app id from 
some map. 

Relevant jiras for collectorLingerPeriod YARN-3995 and YARN-7835

> Race condition when container events are published for KILLED applications
> --
>
> Key: YARN-8130
> URL: https://issues.apache.org/jira/browse/YARN-8130
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Reporter: Charan Hebri
>Priority: Major
>
> There seems to be a race condition happening when an application is KILLED 
> and the corresponding container event information is being published. For 
> completed containers, a YARN_CONTAINER_FINISHED event is generated but for 
> some containers in a KILLED application this information is missing. Below is 
> a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
> (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
> application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1523259757659_0003 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
> (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been 
> removed before the entity could be published for 
> TimelineEntity[type='YARN_CONTAINER', 
> id='container_1523259757659_0003_01_02']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
> finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_01. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_02. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(192)) - The collector service for 
> application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:handle(1572)) - couldn't find application 
> application_1523259757659_0003 while processing FINISH_APPS event. The 
> ResourceManager allocated resources for this application to the NodeManager 
> but no active containers were found to process{code}
> The container id specified in the log, 
> *container_1523259757659_0003_01_02* is the one that has the finished 
> event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7598) Document how to use classpath isolation for aux-services in YARN

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432732#comment-16432732
 ] 

genericqa commented on YARN-7598:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
36m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 19 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7598 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918413/YARN-7598.4.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 80cfdff66c5a 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cef8eb7 |
| maven | version: Apache Maven 3.3.9 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/20292/artifact/out/whitespace-tabs.txt
 |
| Max. process+thread count | 341 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20292/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Document how to use classpath isolation for aux-services in YARN
> 
>
> Key: YARN-7598
> URL: https://issues.apache.org/jira/browse/YARN-7598
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7598.2.patch, YARN-7598.3.patch, YARN-7598.4.patch, 
> YARN-7598.trunk.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7825) Maintain constant horizontal application info bar for all pages

2018-04-10 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-7825:
-
Attachment: Screen Shot 2018-04-10 at 11.06.40 AM.png
Screen Shot 2018-04-10 at 11.07.29 AM.png
Screen Shot 2018-04-10 at 11.07.07 AM.png
Screen Shot 2018-04-10 at 11.06.27 AM.png
Screen Shot 2018-04-10 at 11.15.27 AM.png

> Maintain constant horizontal application info bar for all pages
> ---
>
> Key: YARN-7825
> URL: https://issues.apache.org/jira/browse/YARN-7825
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-04-10 at 11.06.27 AM.png, Screen Shot 
> 2018-04-10 at 11.06.40 AM.png, Screen Shot 2018-04-10 at 11.07.07 AM.png, 
> Screen Shot 2018-04-10 at 11.07.29 AM.png, Screen Shot 2018-04-10 at 11.15.27 
> AM.png
>
>
> Steps:
> 1) enable Ats v2
> 2) Start Yarn service application ( Httpd )
> 3) Fix horizontal info bar for below pages.
>  * component page
>  * Component Instance info page 
>  * Application attempt Info 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8073) TimelineClientImpl doesn't honor yarn.timeline-service.versions configuration

2018-04-10 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432710#comment-16432710
 ] 

Vrushali C commented on YARN-8073:
--

Hi [~rohithsharma]

Yes, please go ahead with cherry pick, the branch-2 jenkins issue is unrelated 
and as long as our local machine compilation & unit tests work, I think we can 
commit it.

thanks
Vrushali

> TimelineClientImpl doesn't honor yarn.timeline-service.versions configuration
> -
>
> Key: YARN-8073
> URL: https://issues.apache.org/jira/browse/YARN-8073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8073-branch-2.03.patch, YARN-8073.01.patch, 
> YARN-8073.02.patch, YARN-8073.03.patch
>
>
> Post YARN-6736, RM support writing into ats v1 and v2 by new configuration 
> setting _yarn.timeline-service.versions_. 
>  Couple of issues observed in deployment are
>  # TimelineClientImpl doesn't honor newly added configuration rather it still 
> get version number from _yarn.timeline-service.version_. This causes not 
> writing into v1.5 API's even though _yarn.timeline-service.versions has 1.5 
> value._ 
>  # Similar line from 1st point, TimelineUtils#timelineServiceV1_5Enabled 
> doesn't honor timeline-service.versions.
>  # JobHistoryEventHandler#serviceInit(), line no 271 check for version number 
> rather than calling YarnConfiguration#timelineServiceV2Enabled
> cc :/ [~agresch]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7931) [atsv2 read acls] Include domain table creation as part of schema creator

2018-04-10 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432688#comment-16432688
 ] 

Vrushali C commented on YARN-7931:
--

Hi [~haibochen]

That's a good question. Let me check what it does, will see if I can add some 
unit test to determine the behavior expectation. I will update the jira / patch 
shortly.

thanks
Vrushali


> [atsv2 read acls] Include domain table creation as part of schema creator
> -
>
> Key: YARN-7931
> URL: https://issues.apache.org/jira/browse/YARN-7931
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Vrushali C
>Priority: Major
> Attachments: YARN-7391.0001.patch, YARN-7391.0002.patch, 
> YARN-7391.0003.patch
>
>
>  
> Update the schema creator to create a domain table to store timeline entity 
> domain info. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8127) Resource leak when async scheduling is enabled

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432629#comment-16432629
 ] 

genericqa commented on YARN-8127:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 24s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 23 unchanged - 0 fixed = 24 total (was 23) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 26s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 26s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoCreatedQueuePreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8127 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918404/YARN-8127.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 281e86cf911b 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cef8eb7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20290/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Commented] (YARN-7598) Document how to use classpath isolation for aux-services in YARN

2018-04-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432601#comment-16432601
 ] 

Xuan Gong commented on YARN-7598:
-

Thanks for the review. [~djp]

Uploaded a new patch to address all your comments.

> Document how to use classpath isolation for aux-services in YARN
> 
>
> Key: YARN-7598
> URL: https://issues.apache.org/jira/browse/YARN-7598
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7598.2.patch, YARN-7598.3.patch, YARN-7598.4.patch, 
> YARN-7598.trunk.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7598) Document how to use classpath isolation for aux-services in YARN

2018-04-10 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-7598:

Attachment: YARN-7598.4.patch

> Document how to use classpath isolation for aux-services in YARN
> 
>
> Key: YARN-7598
> URL: https://issues.apache.org/jira/browse/YARN-7598
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7598.2.patch, YARN-7598.3.patch, YARN-7598.4.patch, 
> YARN-7598.trunk.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens

2018-04-10 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432589#comment-16432589
 ] 

Shane Kumpf commented on YARN-2674:
---

[~chenchun] - Thanks for the patch here. We are seeing this when testing the 
Docker runtime and it results in extra Docker containers being launched on RM 
restart, which is problematic. I've validated that the logic in this patch 
resolves that issue. Any chance you'd be able to update the patch? If you don't 
have the time, I could put up a patch based on your previous patch.

> Distributed shell AM may re-launch containers if RM work preserving restart 
> happens
> ---
>
> Key: YARN-2674
> URL: https://issues.apache.org/jira/browse/YARN-2674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, resourcemanager
>Reporter: Chun Chen
>Assignee: Chun Chen
>Priority: Major
>  Labels: oct16-easy
> Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, 
> YARN-2674.4.patch, YARN-2674.5.patch
>
>
> Currently, if RM work preserving restart happens while distributed shell is 
> running, distribute shell AM may re-launch all the containers, including 
> new/running/complete. We must make sure it won't re-launch the 
> running/complete containers.
> We need to remove allocated containers from 
> AMRMClientImpl#remoteRequestsTable once AM receive them from RM. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8134) Support specifying node resources in SLS

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432542#comment-16432542
 ] 

genericqa commented on YARN-8134:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
32s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-8134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918396/YARN-8134.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux 16f633d2cbd4 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cef8eb7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20291/testReport/ |
| Max. process+thread count | 456 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20291/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Support specifying node resources in SLS
> 

[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-10 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432502#comment-16432502
 ] 

Wei Yan commented on YARN-8135:
---

{quote}Think this should be renamed to YARN-Submarine though.

I'm not sure Hadoop-Submarine or YARN-Submarine, let's decide once I finish the 
design. 
{quote}
Hadoop-Submarine may be better here, as the project may not just only involve 
with YARN. Also, Hadoop-Submarine may be more attractive than YARN-Submarine.

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8135) Hadoop {Submarine} Project: Simple and scalable deployment of deep learning training / serving jobs on Hadoop

2018-04-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432492#comment-16432492
 ] 

Wangda Tan commented on YARN-8135:
--

[~oliverhuh...@gmail.com], 

There's no technical issues to make TF application to access HDFS. But it is 
really a overhead to use HDFS if the user doesn't have experience of Hadoop 
before [https://www.tensorflow.org/deploy/hadoop]. Just want to make this step 
easier. 

 

[~asuresh], 

Thanks for interested in this project. I'm not sure Hadoop-Submarine or 
YARN-Submarine, let's decide once I finish the design. 

> Hadoop {Submarine} Project: Simple and scalable deployment of deep learning 
> training / serving jobs on Hadoop
> -
>
> Key: YARN-8135
> URL: https://issues.apache.org/jira/browse/YARN-8135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: image-2018-04-09-14-35-16-778.png, 
> image-2018-04-09-14-44-41-101.png
>
>
> Description:
> *Goals:*
>  - Allow infra engineer / data scientist to run *unmodified* Tensorflow jobs 
> on YARN.
>  - Allow jobs easy access data/models in HDFS and other storages.
>  - Can launch services to serve Tensorflow/MXNet models.
>  - Support run distributed Tensorflow jobs with simple configs.
>  - Support run user-specified Docker images.
>  - Support specify GPU and other resources.
>  - Support launch tensorboard if user specified.
>  - Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
> *Why this name?*
>  - Because Submarine is the only vehicle can let human to explore deep 
> places. B-)
> Compare to other projects:
> !image-2018-04-09-14-44-41-101.png!
> *Notes:*
> *GPU Isolation of XLearning project is achieved by patched YARN, which is 
> different from community’s GPU isolation solution.
> **XLearning needs few modification to read ClusterSpec from env.
> *References:*
>  - TensorflowOnSpark (Yahoo): [https://github.com/yahoo/TensorFlowOnSpark]
>  - TensorFlowOnYARN (Intel): 
> [https://github.com/Intel-bigdata/TensorFlowOnYARN]
>  - Spark Deep Learning (Databricks): 
> [https://github.com/databricks/spark-deep-learning]
>  - XLearning (Qihoo360): [https://github.com/Qihoo360/XLearning]
>  - Kubeflow (Google): [https://github.com/kubeflow/kubeflow]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7159) Normalize unit of resource objects in RM and avoid to do unit conversion in critical path

2018-04-10 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432477#comment-16432477
 ] 

Manikandan R commented on YARN-7159:


[~sunilg] Can you please take this forward? Thanks.

> Normalize unit of resource objects in RM and avoid to do unit conversion in 
> critical path
> -
>
> Key: YARN-7159
> URL: https://issues.apache.org/jira/browse/YARN-7159
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-7159.001.patch, YARN-7159.002.patch, 
> YARN-7159.003.patch, YARN-7159.004.patch, YARN-7159.005.patch, 
> YARN-7159.006.patch, YARN-7159.007.patch, YARN-7159.008.patch, 
> YARN-7159.009.patch, YARN-7159.010.patch, YARN-7159.011.patch, 
> YARN-7159.012.patch, YARN-7159.013.patch, YARN-7159.015.patch, 
> YARN-7159.016.patch, YARN-7159.017.patch, YARN-7159.018.patch, 
> YARN-7159.019.patch, YARN-7159.020.patch, YARN-7159.021.patch, 
> YARN-7159.022.patch, YARN-7159.023.patch
>
>
> Currently resource conversion could happen in critical code path when 
> different unit is specified by client. This could impact performance and 
> throughput of RM a lot. We should do unit normalization when resource passed 
> to RM and avoid expensive unit conversion every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-10 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432474#comment-16432474
 ] 

Manikandan R commented on YARN-4606:


[~leftnoteasy] [~sunilg] Can you please share your views?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-04-10 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432433#comment-16432433
 ] 

genericqa commented on YARN-7221:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} YARN-7221 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-7221 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918397/YARN-7221.021.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20289/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch, YARN-7221.007.patch, YARN-7221.008.patch, 
> YARN-7221.009.patch, YARN-7221.010.patch, YARN-7221.011.patch, 
> YARN-7221.012.patch, YARN-7221.013.patch, YARN-7221.014.patch, 
> YARN-7221.015.patch, YARN-7221.016.patch, YARN-7221.017.patch, 
> YARN-7221.018.patch, YARN-7221.019.patch, YARN-7221.020.patch, 
> YARN-7221.021.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >