[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups

2018-05-16 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: YARN-4599.015.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, 
> YARN-4599.013.patch, YARN-4599.014.patch, YARN-4599.015.patch, 
> YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8302) ATS v2 should handle HBase connection issue properly

2018-05-16 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-8302.
-
Resolution: Won't Fix

Closing the JIRA as won't fix since it is related to configuration issue. 
Decreasing hbase client timeout can be tuned with above commented 
configurations. 

> ATS v2 should handle HBase connection issue properly
> 
>
> Key: YARN-8302
> URL: https://issues.apache.org/jira/browse/YARN-8302
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Priority: Major
>
> ATS v2 call times out with below error when it can't connect to HBase 
> instance.
> {code}
> bash-4.2$ curl -i -k -s -1  -H 'Content-Type: application/json'  -H 'Accept: 
> application/json' --max-time 5   --negotiate -u : 
> 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092'
> curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
> {code}
> {code:title=ATS log}
> 2018-05-15 23:10:03,623 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, 
> retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:13,651 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, 
> retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:23,730 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, 
> retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:33,788 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, 
> retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1{code}
> There are two issues here.
> 1) Check why ATS can't connect to HBase
> 2) In case of connection error,  ATS call should not get timeout. It should 
> fail with proper error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8302) ATS v2 should handle HBase connection issue properly

2018-05-16 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478532#comment-16478532
 ] 

Rohith Sharma K S commented on YARN-8302:
-

If HBase is down for any reasons, hbase client will retry for 20 minutes with 
default configurations loaded. Reducing the default value for 
*hbase.client.retries.number* from 15 to 7, decreased drastically from 20 
minutes to 1.5 minutes.

> ATS v2 should handle HBase connection issue properly
> 
>
> Key: YARN-8302
> URL: https://issues.apache.org/jira/browse/YARN-8302
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Priority: Major
>
> ATS v2 call times out with below error when it can't connect to HBase 
> instance.
> {code}
> bash-4.2$ curl -i -k -s -1  -H 'Content-Type: application/json'  -H 'Accept: 
> application/json' --max-time 5   --negotiate -u : 
> 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092'
> curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
> {code}
> {code:title=ATS log}
> 2018-05-15 23:10:03,623 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, 
> retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:13,651 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, 
> retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:23,730 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, 
> retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:33,788 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, 
> retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1{code}
> There are two issues here.
> 1) Check why ATS can't connect to HBase
> 2) In case of connection error,  ATS call should not get timeout. It should 
> fail with proper error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-16 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478490#comment-16478490
 ] 

genericqa commented on YARN-4599:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
33m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
25s{color} | {color:green} root: The patch generated 0 new + 235 unchanged - 1 
fixed = 235 total (was 236) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}174m 38s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}377m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestPread |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-4599 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923724/YAR

[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups

2018-05-16 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: YARN-4599.014.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, 
> YARN-4599.013.patch, YARN-4599.014.patch, YARN-4599.sandflee.patch, 
> yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups

2018-05-16 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: YARN-4599.013.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, 
> YARN-4599.013.patch, YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8234) Improve RM system metrics publisher's performance by pushing events to timeline server in batch

2018-05-16 Thread Hu Ziqian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478467#comment-16478467
 ] 

Hu Ziqian commented on YARN-8234:
-

Hi  [~leftnoteasy], 

hadoop.yarn.server.resourcemanager.TestClientRMTokens   
hadoop.yarn.server.resourcemanager.TestAMAuthorization   
hadoop.yarn.server.resourcemanager.scheduler.fair.TestSchedulingPolicy

these test can passed in my laptop, i have no idea why it failed in genericqa. 
And i checked these tests, none of them using SystemMetricsPublisher, which 
changed in my patch. 

Other tests and  findbugs are all fixed.

> Improve RM system metrics publisher's performance by pushing events to 
> timeline server in batch
> ---
>
> Key: YARN-8234
> URL: https://issues.apache.org/jira/browse/YARN-8234
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.8.3
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Major
> Attachments: YARN-8234-branch-2.8.3.001.patch, 
> YARN-8234-branch-2.8.3.002.patch, YARN-8234-branch-2.8.3.003.patch
>
>
> When system metrics publisher is enabled, RM will push events to timeline 
> server via restful api. If the cluster load is heavy, many events are sent to 
> timeline server and the timeline server's event handler thread locked. 
> YARN-7266 talked about the detail of this problem. Because of the lock, 
> timeline server can't receive event as fast as it generated in RM and lots of 
> timeline event stays in RM's memory. Finally, those events will consume all 
> RM's memory and RM will start a full gc (which cause an JVM stop-world and 
> cause a timeout from rm to zookeeper) or even get an OOM. 
> The main problem here is that timeline can't receive timeline server's event 
> as fast as it generated. Now, RM system metrics publisher put only one event 
> in a request, and most time costs on handling http header or some thing about 
> the net connection on timeline side. Only few time is spent on dealing with 
> the timeline event which is truly valuable.
> In this issue, we add a buffer in system metrics publisher and let publisher 
> send events to timeline server in batch via one request. When sets the batch 
> size to 1000, in out experiment the speed of the timeline server receives 
> events has 100x improvement. We have implement this function int our product 
> environment which accepts 2 app's in one hour and it works fine.
> We add following configuration:
>  * yarn.resourcemanager.system-metrics-publisher.batch-size: the size of 
> system metrics publisher sending events in one request. Default value is 1000
>  * yarn.resourcemanager.system-metrics-publisher.buffer-size: the size of the 
> event buffer in system metrics publisher.
>  * yarn.resourcemanager.system-metrics-publisher.interval-seconds: When 
> enable batch publishing, we must avoid that the publisher waits for a batch 
> to be filled up and hold events in buffer for long time. So we add another 
> thread which send event's in the buffer periodically. This config sets the 
> interval of the cyclical sending thread. The default value is 60s.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-16 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478460#comment-16478460
 ] 

genericqa commented on YARN-4599:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
30m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
1s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 26m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 
16s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 35s{color} | {color:orange} root: The patch generated 1 new + 235 unchanged 
- 1 fixed = 236 total (was 236) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 17m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 36s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m  9s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}196m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.util.TestBasicDiskValidator |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-4599 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attach

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478425#comment-16478425
 ] 

genericqa commented on YARN-8292:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
13s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 9 new + 38 unchanged - 0 fixed = 47 total (was 38) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
32s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 16s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}156m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8292 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923819/YARN-8292.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 290f3268becf 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/persona

[jira] [Commented] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart

2018-05-16 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478394#comment-16478394
 ] 

genericqa commented on YARN-8290:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 62m 
32s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923820/YARN-8290.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f6bb0e1a6c56 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / be53969 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/20759/artifact/out/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20759/testReport/ |
| Max. process+thread count | 799 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/Pre

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478387#comment-16478387
 ] 

genericqa commented on YARN-8292:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
20s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 9 new + 42 unchanged - 0 fixed = 51 total (was 42) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
10s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m  2s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}155m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoCreatedQueuePreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8292 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923810/YARN-8292.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 49598110aae6 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Per

[jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats

2018-05-16 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478333#comment-16478333
 ] 

genericqa commented on YARN-8310:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 30s{color} 
| {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common generated 1 
new + 33 unchanged - 0 fixed = 34 total (was 33) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 21s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 5 new + 
35 unchanged - 0 fixed = 40 total (was 35) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 56s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
11s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
20s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8310 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923816/YARN-8310.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4d6cc1cdb009 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / be53969 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/20758/artifact/out/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20758/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20758/testReport/ |
| asflicense

[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-16 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478317#comment-16478317
 ] 

genericqa commented on YARN-8080:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 15 new + 121 unchanged - 2 fixed = 136 total (was 123) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m  
9s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
40s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8080 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attac

[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-16 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478311#comment-16478311
 ] 

Eric Yang commented on YARN-8141:
-

[~csingh] Thank you for the patch, a few nits:

FindAbsoluteMount method can be skipped.  Container-executor looks up source 
mount location and comparing the absolute path to the white list mount 
(YARN-5534).

DockerContainers.md still have reference to 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS.  This can be removed as 
well.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch, YARN-8141.004.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart

2018-05-16 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-8290:
---

 Assignee: Eric Yang
Affects Version/s: 3.1.1

[~leftnoteasy] According to your suggestion that ACL information is set too 
late and killing AM prior to ACL information is propagated can cause RM 
recovery to load partial application record.  The suggested change is to move 
the ACL setup into ApplicationToSchedulerTransition.  The patch moved the block 
of code accordingly.  Let me know if this is the correct fix.  Thanks

> Yarn application failed to recover with "Error Launching job : User is not 
> set in the application report" error after RM restart
> 
>
> Key: YARN-8290
> URL: https://issues.apache.org/jira/browse/YARN-8290
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8290.001.patch
>
>
> Scenario:
> 1) Start 5 streaming application in background
> 2) Kill Active RM and cause RM failover
> After RM failover, The application failed with below error.
> {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception on [rm2] : 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1517520038847_0003' doesn't exist in RM. Please check 
> that the job submission was successful.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
> , so propagating back to caller.
> 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application 
> application_1517520038847_0003
> 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1517520038847_0003
> 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is 
> not set in the application report
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups

2018-05-16 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: YARN-4599.012.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, 
> YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart

2018-05-16 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8290:

Attachment: YARN-8290.001.patch

> Yarn application failed to recover with "Error Launching job : User is not 
> set in the application report" error after RM restart
> 
>
> Key: YARN-8290
> URL: https://issues.apache.org/jira/browse/YARN-8290
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Priority: Major
> Attachments: YARN-8290.001.patch
>
>
> Scenario:
> 1) Start 5 streaming application in background
> 2) Kill Active RM and cause RM failover
> After RM failover, The application failed with below error.
> {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception on [rm2] : 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1517520038847_0003' doesn't exist in RM. Please check 
> that the job submission was successful.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
> , so propagating back to caller.
> 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application 
> application_1517520038847_0003
> 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1517520038847_0003
> 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is 
> not set in the application report
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478266#comment-16478266
 ] 

Wangda Tan commented on YARN-8292:
--

[~jlowe],
I think you're correct :). I take my word back, my previous assumption:

{code}
   Σ(selected-container.resource) <= (for all resource types) 
Σ(queue.to-be-obtain)
selected-container  queue
{code}

Can break one case which one starving queue need to preempt containers from two 
over-utilized queues. 
For example:
{code}
queue-A, 
guaranteed: <30,50> , used: <40, 60>.

queue-B,
guaranteed: <30,50>, used: <40, 60>
{code} 

Assume we have a queue C want 20:20 resources.
So in this case, both of queue-A/queue-B, resource to obtain = 10:10

If containers running on the system have same size = 20:30. Under my existing 
approach, nothing can be preempted. This is also why some UT failed.

I just used your approach: 
bq. I think the check for a zero resource can be dropped and it simplifies to 
the toObtainAfterPreemption component-wise max'd with zero is less than the 
amount to obtain from the partition (after being max'd with zero).

With the 0 resource type check I commented above:
{code}
// If a toObtain resource type == 0, set it to -1 to avoid 0 resource
// type affect following doPreemption check: isAnyMajorResourceZero
for (ResourceInformation ri : toObtainByPartition.getResources()) {
  if (ri.getValue() == 0) {
ri.setValue(-1);
  }
}
{code} 

Now everything works. Please check the attached patch (ver.3) to see if it 
works.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Attachment: YARN-8292.003.patch

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats

2018-05-16 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478258#comment-16478258
 ] 

Robert Kanter commented on YARN-8310:
-

The patch adds back code very similar to the original reading code for each of 
the 3 tokens.  It will get called if a {{InvalidProtocolBufferException}} is 
thrown when trying to parse it as a protobuf.  Also added tests.

I had to add additional code to handle the case where the old 
{{ContainerTokenIdentifier}} format ends early because YARN-2581 added a 
{{LogAggregationContext}} field to the end, which might not exist.

> Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
> ContainerTokenIdentifier formats
> ---
>
> Key: YARN-8310
> URL: https://issues.apache.org/jira/browse/YARN-8310
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Major
> Attachments: YARN-8310.001.patch, YARN-8310.branch-2.001.patch
>
>
> In some recent upgrade testing, we saw this error causing the NodeManager to 
> fail to startup afterwards:
> {noformat}
> org.apache.hadoop.service.ServiceStateException: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> contained an invalid tag (zero).
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message contained an invalid tag (zero).
>   at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
>   at 
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
>   at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
>   at 
> org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
>   at 
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
>   at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 5 more
> {noformat}
> The NodeManager fails because it's trying to read a 
> {{ContainerTokenIdentifier}} in the "old" format before we changed them to 
> protobufs (YARN-668).  This is very similar to YARN-5594 where we ran into a 
> similar problem with the ResourceManager and RM Delegation Tokens.
> To provide a better experience, we should make the code able to read the old 
> format if it's unable to read it using the new format.  We didn't run into 
> any errors with the other two types of tokens that YARN-668 incompatibly 
> changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may a

[jira] [Updated] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats

2018-05-16 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-8310:

Attachment: YARN-8310.001.patch
YARN-8310.branch-2.001.patch

> Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
> ContainerTokenIdentifier formats
> ---
>
> Key: YARN-8310
> URL: https://issues.apache.org/jira/browse/YARN-8310
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Major
> Attachments: YARN-8310.001.patch, YARN-8310.branch-2.001.patch
>
>
> In some recent upgrade testing, we saw this error causing the NodeManager to 
> fail to startup afterwards:
> {noformat}
> org.apache.hadoop.service.ServiceStateException: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> contained an invalid tag (zero).
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message contained an invalid tag (zero).
>   at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
>   at 
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
>   at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
>   at 
> org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
>   at 
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
>   at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 5 more
> {noformat}
> The NodeManager fails because it's trying to read a 
> {{ContainerTokenIdentifier}} in the "old" format before we changed them to 
> protobufs (YARN-668).  This is very similar to YARN-5594 where we ran into a 
> similar problem with the ResourceManager and RM Delegation Tokens.
> To provide a better experience, we should make the code able to read the old 
> format if it's unable to read it using the new format.  We didn't run into 
> any errors with the other two types of tokens that YARN-668 incompatibly 
> changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix 
> those while we're at it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats

2018-05-16 Thread Robert Kanter (JIRA)
Robert Kanter created YARN-8310:
---

 Summary: Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
ContainerTokenIdentifier formats
 Key: YARN-8310
 URL: https://issues.apache.org/jira/browse/YARN-8310
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter


In some recent upgrade testing, we saw this error causing the NodeManager to 
fail to startup afterwards:
{noformat}
org.apache.hadoop.service.ServiceStateException: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message contained 
an invalid tag (zero).
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
contained an invalid tag (zero).
at 
com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
at 
com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860)
at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824)
at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
at 
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
at 
org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
at 
org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
at 
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
... 5 more
{noformat}
The NodeManager fails because it's trying to read a 
{{ContainerTokenIdentifier}} in the "old" format before we changed them to 
protobufs (YARN-668).  This is very similar to YARN-5594 where we ran into a 
similar problem with the ResourceManager and RM Delegation Tokens.

To provide a better experience, we should make the code able to read the old 
format if it's unable to read it using the new format.  We didn't run into any 
errors with the other two types of tokens that YARN-668 incompatibly changed 
(NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix those while 
we're at it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478246#comment-16478246
 ] 

Wangda Tan commented on YARN-8292:
--

Attached ver.2 patch, which fixed 0 value rare-resource problem mentioned by 
Jason.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Attachment: YARN-8292.002.patch

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478244#comment-16478244
 ] 

Wangda Tan commented on YARN-8292:
--

[~eepayne],
It is actually on in {{setup()}}. :).

[~jlowe],
I can understand ur suggestion now, but simply drop the check as you mentioned:
bq. I think the check for a zero resource can be dropped and it simplifies to 
the toObtainAfterPreemption component-wise max'd with zero is less than the 
amount to obtain from the partition (after being max'd with zero).
Is not enough.

The reason is, we want to make sure no over-preemption happens. For example. If 
res-to-obtain = (3, 0, 0), and container has size = (4, 1, 0) (The 3rd type is 
0 for both). We don't want the preempt happen because it will make the queue 
under utilized. And it can preempt more containers than required. We need to 
make sure that: 
{code}
   Σ(selected-container.resource) <= (for all resource types) 
Σ(queue.to-be-obtain)
selected-container  queue
{code}

In my previous patch, as you mentioned if some resource type are always 0, it 
will invalidate the check. So I added a check: 
{code}
// If a toObtain resource type == 0, set it to -1 to avoid 0 resource
// type affect following doPreemption check: isAnyMajorResourceZero
for (ResourceInformation ri : toObtainByPartition.getResources()) {
  if (ri.getValue() == 0) {
ri.setValue(-1);
  }
}
{code} 

Before
{code}
if (Resources.greaterThan(rc, clusterResource, toObtainByPartition,
Resources.none())
{code}

It looks like the problem can be solved. Please let me know if you think 
different.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-16 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478235#comment-16478235
 ] 

Suma Shivaprasad commented on YARN-8080:


Thanks [~eyang] Have addressed review comments for terminating according to the 
restart policy and fixed most of the CS issues. 

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch, 
> YARN-8080.014.patch, YARN-8080.015.patch, YARN-8080.016.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8309) Diagnostic message for yarn service app failure due token renewal should be improved

2018-05-16 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8309:


 Summary: Diagnostic message for yarn service app failure due token 
renewal should be improved
 Key: YARN-8309
 URL: https://issues.apache.org/jira/browse/YARN-8309
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


When Yarn service application failed due to token renewal issue , The 
diagonstic message was unclear . 
{code:java}
Application application_1526413043392_0002 failed 20 times due to AM Container 
for appattempt_1526413043392_0002_20 exited with exitCode: 1 Failing this 
attempt.Diagnostics: [2018-05-15 23:15:28.779]Exception from container-launch. 
Container id: container_e04_1526413043392_0002_20_01 Exit code: 1 Exception 
message: Launch container failed Shell output: main : command provided 1 main : 
run as user is hbase main : requested yarn user is hbase Getting exit code 
file... Creating script paths... Writing pid file... Writing to tmp file 
/grid/0/hadoop/yarn/local/nmPrivate/application_1526413043392_0002/container_e04_1526413043392_0002_20_01/container_e04_1526413043392_0002_20_01.pid.tmp
 Writing to cgroup task files... Creating local dirs... Launching container... 
Getting exit code file... Creating script paths... [2018-05-15 
23:15:28.806]Container exited with a non-zero exit code 1. Error file: 
prelaunch.err. Last 4096 bytes of prelaunch.err : [2018-05-15 
23:15:28.807]Container exited with a non-zero exit code 1. Error file: 
prelaunch.err. Last 4096 bytes of prelaunch.err : For more detailed output, 
check the application tracking page: 
https://xxx:8090/cluster/app/application_1526413043392_0002 Then click on links 
to logs of each attempt. . Failing the application.{code}
Here, diagnostic message should be improved to specify that AM is failing due 
to token renewal issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8080) YARN native service should support component restart policy

2018-05-16 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8080:
---
Attachment: YARN-8080.016.patch

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch, 
> YARN-8080.014.patch, YARN-8080.015.patch, YARN-8080.016.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8308) Yarn service app fails due to issues with Renew Token

2018-05-16 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8308:


 Summary: Yarn service app fails due to issues with Renew Token
 Key: YARN-8308
 URL: https://issues.apache.org/jira/browse/YARN-8308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Yesha Vora


Run Yarn service application beyond dfs.namenode.delegation.token.max-lifetime. 
Here, yarn service application fails with below error. 

{code}
2018-05-15 23:14:35,652 [main] WARN  ipc.Client - Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
2018-05-15 23:14:35,654 [main] INFO  service.AbstractService - Service Service 
Master failed in state INITED
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
at org.apache.hadoop.ipc.Client.call(Client.java:1437)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581)
at 
org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182)
at 
org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337)
at 
org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242)
at 
org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
2018-05-15 23:14:35,659 [main] INFO  service.ServiceMaster - Stopping app master
2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting 
service master
org.apache.hadoop.service.ServiceStateException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
at 
org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
Caus

[jira] [Commented] (YARN-8103) Add CLI interface to query node attributes

2018-05-16 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478211#comment-16478211
 ] 

genericqa commented on YARN-8103:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-3409 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  6m 
53s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
30s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
33s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
39s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  7m  
1s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
45s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
45s{color} | {color:green} YARN-3409 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 25m  
7s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 25m  7s{color} | 
{color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 25m  7s{color} 
| {color:red} root in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 46s{color} | {color:orange} root: The patch generated 14 new + 187 unchanged 
- 8 fixed = 201 total (was 195) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
27s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
23s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
14s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 2 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 21s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
47s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
24s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m 16s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
51s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:

[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded

2018-05-16 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478207#comment-16478207
 ] 

Robert Kanter commented on YARN-8273:
-

Thanks for the patch.  One question:
- Why make {{LogAggregationDFSException}} an undeclared exception?  Why not 
make it a subclass of {{YarnException}} and declare it?
-- Also, if we do keep it as undeclared, it should be a subclass of 
{{YarnRuntimeException}} instead of {{RuntimeException}}.

I was going to suggest we catch the {{DSQuotaExceededException}} when closing 
{{writer}}, but it turns out that {{TFile#close}} does _not_ close the 
underlying {{FSDataOutputStream}}.  That's probably not what most people are 
expecting, but there's nothing we can do about that now. :/

> Log aggregation does not warn if HDFS quota in target directory is exceeded
> ---
>
> Key: YARN-8273
> URL: https://issues.apache.org/jira/browse/YARN-8273
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-8273.000.patch, YARN-8273.001.patch, 
> YARN-8273.002.patch
>
>
> It appears that if an HDFS space quota is set on a target directory for log 
> aggregation and the quota is already exceeded when log aggregation is 
> attempted, zero-byte log files will be written to the HDFS directory, however 
> NodeManager logs do not reflect a failure to write the files successfully 
> (i.e. there are no ERROR or WARN messages to this effect).
> An improvement may be worth investigating to alert users to this scenario, as 
> otherwise logs for a YARN application may be missing both on HDFS and locally 
> (after local log cleanup is done) and the user may not otherwise be informed.
> Steps to reproduce:
> * Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB)
> * Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full
> * Run a Spark or MR job in the cluster
> * Observe that zero byte files are written to HDFS after job completion
> * Observe that YARN container logs are also not present on the NM hosts (or 
> are deleted after yarn.nodemanager.delete.debug-delay-sec)
> * Observe that no ERROR or WARN messages appear to be logged in the NM role 
> log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478184#comment-16478184
 ] 

Eric Payne commented on YARN-4781:
--

Hi [~sunilg]. Will you have an opportunity to review the latest patch?

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-16 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478178#comment-16478178
 ] 

genericqa commented on YARN-4599:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
11s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 31m 
25s{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
30m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m  
0s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 27m  0s{color} | 
{color:red} root generated 5 new + 6 unchanged - 0 fixed = 11 total (was 6) 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 27m  0s{color} 
| {color:red} root generated 188 new + 1277 unchanged - 0 fixed = 1465 total 
(was 1277) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
41s{color} | {color:green} root: The patch generated 0 new + 235 unchanged - 1 
fixed = 235 total (was 236) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 17m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 25s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}150m 24s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}335m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestDFSInotifyEventInputStre

[jira] [Commented] (YARN-8071) Add ability to specify nodemanager environment variables individually

2018-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478163#comment-16478163
 ] 

Hudson commented on YARN-8071:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14213 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14213/])
YARN-8071. Add ability to specify nodemanager environment variables (jlowe: rev 
be539690477f7fee8f836bf3612cbe7ff6a3506e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java


> Add ability to specify nodemanager environment variables individually
> -
>
> Key: YARN-8071
> URL: https://issues.apache.org/jira/browse/YARN-8071
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8071.001.patch, YARN-8071.002.patch, 
> YARN-8071.003.patch
>
>
> YARN-6830 describes a problem where environment variables that contain commas 
> cannot be specified via {{-Dmapreduce.map.env}}.
> For example:
> {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}}
> will set {{MOUNTS}} to {{/tmp/foo}}
> In that Jira, [~aw] suggested that we change the API to provide a way to 
> specify environment variables individually, the same way that Spark does.
> {quote}Rather than fight with a regex why not redefine the API instead?
>  
> -Dmapreduce.map.env.MODE=bar
>  -Dmapreduce.map.env.IMAGE_NAME=foo
>  -Dmapreduce.map.env.MOUNTS=/tmp/foo,/tmp/bar
> ...
> e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> This greatly simplifies the input validation needed and makes it clear what 
> is actually being defined.
> {quote}
> The mapreduce properties were dealt with in [MAPREDUCE-7069].  This Jira will 
> address the YARN properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478157#comment-16478157
 ] 

Eric Payne commented on YARN-8292:
--

bq. you can check 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePremption
 test for details.
This test is not actually enabling DRF. You need to add the 5th argument to 
{{buildEnv()}}:
{code}
-buildEnv(labelsConfig, nodesConfig, queuesConfig, appsConfig);
+buildEnv(labelsConfig, nodesConfig, queuesConfig, appsConfig, true);
{code}

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8307) [atsv2 read acls] Coprocessor for reader authorization check

2018-05-16 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-8307:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-7055

> [atsv2 read acls] Coprocessor for reader authorization check
> 
>
> Key: YARN-8307
> URL: https://issues.apache.org/jira/browse/YARN-8307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Vrushali C
>Priority: Major
>
> Jira to track coprocessor creation for reader authorization check when 
> security is enabled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8307) [atsv2 read acls] Coprocessor for reader authorization check

2018-05-16 Thread Vrushali C (JIRA)
Vrushali C created YARN-8307:


 Summary: [atsv2 read acls] Coprocessor for reader authorization 
check
 Key: YARN-8307
 URL: https://issues.apache.org/jira/browse/YARN-8307
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vrushali C
Assignee: Vrushali C



Jira to track coprocessor creation for reader authorization check when security 
is enabled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8071) Add ability to specify nodemanager environment variables individually

2018-05-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478114#comment-16478114
 ] 

Jason Lowe commented on YARN-8071:
--

bq.  I assume this windows-only test must be failing without this fix.

Ah, my bad.  I missed the fact that this test was Windows-only, so that's why 
it was "passing" even without the change.

+1 for the latest patch.  Committing this.


> Add ability to specify nodemanager environment variables individually
> -
>
> Key: YARN-8071
> URL: https://issues.apache.org/jira/browse/YARN-8071
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-8071.001.patch, YARN-8071.002.patch, 
> YARN-8071.003.patch
>
>
> YARN-6830 describes a problem where environment variables that contain commas 
> cannot be specified via {{-Dmapreduce.map.env}}.
> For example:
> {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}}
> will set {{MOUNTS}} to {{/tmp/foo}}
> In that Jira, [~aw] suggested that we change the API to provide a way to 
> specify environment variables individually, the same way that Spark does.
> {quote}Rather than fight with a regex why not redefine the API instead?
>  
> -Dmapreduce.map.env.MODE=bar
>  -Dmapreduce.map.env.IMAGE_NAME=foo
>  -Dmapreduce.map.env.MOUNTS=/tmp/foo,/tmp/bar
> ...
> e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> This greatly simplifies the input validation needed and makes it clear what 
> is actually being defined.
> {quote}
> The mapreduce properties were dealt with in [MAPREDUCE-7069].  This Jira will 
> address the YARN properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478104#comment-16478104
 ] 

Jason Lowe commented on YARN-8292:
--

bq. After preemption, there're at least one 0 major resources (which indicates 
that the queue is still satisfied after preemption).

I'm still confused by this point.  How is that not going to be always true when 
the cluster has a rarely-used resource dimension?  For example, let's say GPU 
is one of the dimensions, and all the apps that want to use GPUs are all 
running in only one of many queues on the cluster.  The other queues will all 
have zero for their GPU usage, and any cross-queue preemptions between those 
other queues will all have zero in the GPU resource for toObtainFromPartition 
and toObtainAfterPreemption.  In other words, it effectively disabled the less 
than Resources.none check when comparing preemptions between these 
non-GPU-using queues because GPU will always be zero so isAnyMajorResourceZero 
will always be true.  Or am I missing something?

For the case of not wanting to kill a container that is (4, 1, 1) when the ask 
is only (3, -1, -1), the comparison against Resources.none should cover that.  
What is an example scenario where the additional check if any resource 
dimension is zero is needed to do the right thing?  From the scenario I 
described above, I can see where it can (incorrectly?) override the comparison 
against Resources.none and preempt a (4, 1, 0) container when the ask is only 
(3, -1, 0).


> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8286) Add NMClient callback on container relaunch

2018-05-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478085#comment-16478085
 ] 

Jason Lowe commented on YARN-8286:
--

This could be implemented as something in the AM/NM client connection, but it 
would require the AM to keep a long-lived connection to every NM that has 
containers running on it.  I think a simpler approach for the AM is to get this 
information via the same channel it gets other container notifications like 
allocated, completed, etc. and that's the AM-RM heartbeat (i.e.: 
ApplicationMasterProtocol#allocate).  Currently when a container completes on 
the NM side, the NM lets the RM know via an out-of-band heartbeat, and the RM 
in turn lets the AM know on the next AM hearbeat.

I think it would be relatively straightforward to have the NM notify the RM of 
any container relaunches, just like it already does for container launches and 
completions.  The RM can then relay this information to the AM.  Then the AM 
wouldn't need to keep connected to every NM for relaunch status, and the 
container relaunch events would arrive to the AM just like container completion 
events do today without any new connections required.  Thoughts?


> Add NMClient callback on container relaunch
> ---
>
> Key: YARN-8286
> URL: https://issues.apache.org/jira/browse/YARN-8286
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Priority: Critical
>
> The AM may need to perform actions when a container has been relaunched. For 
> example, the service AM would want to change the state it has recorded for 
> the container and retrieve new container status for the container, in case 
> the container IP has changed. (The NM would also need to remove the IP it has 
> stored for the container, so container status calls don't return an IP for a 
> container that is not currently running.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain

2018-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478053#comment-16478053
 ] 

Hudson commented on YARN-7933:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14212 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14212/])
YARN-7933. [atsv2 read acls] Add TimelineWriter#writeDomain. (Rohith 
(haibochen: rev e3b7d7ac1694b8766ae11bc7e8ecf09763bb26db)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/TimelineWriter.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/collector/TestTimelineCollector.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestHBaseTimelineStorageDomain.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/domain/DomainTableRW.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/HBaseTimelineWriterImpl.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timelineservice/TimelineDomain.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorWebService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/FileSystemTimelineWriterImpl.java


> [atsv2 read acls] Add TimelineWriter#writeDomain 
> -
>
> Key: YARN-7933
> URL: https://issues.apache.org/jira/browse/YARN-7933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7933.01.patch, YARN-7933.02.patch, 
> YARN-7933.03.patch, YARN-7933.04.patch, YARN-7933.05.patch, YARN-7933.06.patch
>
>
>  
> Add an API TimelineWriter#writeDomain for writing the domain info 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8305) [UI2] No information available per container about the memory/vcores

2018-05-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8305:
-
Reporter: Sumana Sathish  (was: Gergely Novák)

> [UI2] No information available per container about the memory/vcores
> 
>
> Key: YARN-8305
> URL: https://issues.apache.org/jira/browse/YARN-8305
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Sumana Sathish
>Assignee: Gergely Novák
>Priority: Major
> Attachments: YARN-8305.001.patch
>
>
> In the Applications > App > Attempts > Attempt page, the Containers panel 
> shows information about container start time, status, etc., but not about 
> container memory/vcores used.
> Discovered by [~ssath...@hortonworks.com].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Description: 
This is an example of the problem: 
  
{code}
//   guaranteed,  max,used,   pending
"root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
"-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
"-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
"-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
{code}

There're 3 resource types. Total resource of the cluster is 30:18:6
For both of a/b, there're 3 containers running, each of container is 2:2:1.
Queue c uses 0 resource, and have 1:1:1 pending resource.

Under existing logic, preemption cannot happen.

  was:
 
 This is an example of the problem: (Same if we have more than 2 resources)
  

Let's say we have 3 queues A/B/C. All containers with equal size <2,3>

 
||Queue||Guaranteed||Used ||Pending||
|A|<20, 10>|<20,30>| |
|B|<20, 10>|0|0|
|C|<20, 10>|0|<20, 30>|
| | | | |

 

Under current logic, A's calculated to-preempt (how much resource other queue 
can preempt) will be <0, 20>. The preemption will not happen. However, under 
the context of DRC, queue A is using more resource than guaranteed, so queue C 
will be starved


> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Fix Version/s: (was: 3.1.1)
   (was: 3.2.0)

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Target Version/s: 3.2.0, 3.1.1

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478026#comment-16478026
 ] 

Wangda Tan edited comment on YARN-8292 at 5/16/18 8:16 PM:
---

Thanks [~eepayne], 

Let me revise the example a bit, I was trying to simplify it to avoid confusing 
to people, I think the example is wrong. Here's one I was using in test:
{code}
//   guaranteed,  max,used,   pending
"root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
"-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
"-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
"-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
{code}

There're 3 resource types. Total resource of the cluster is 30:18:6
For both of a/b, there're 3 containers running, each of container is 2:2:1.
Queue c uses 0 resource, and have 1:1:1 pending resource. 

Prior to the attached patch. The preemption cannot happen, you can check 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePreemption
 test for details.


was (Author: leftnoteasy):
Thanks [~eepayne], 

Let me revise the example a bit, I was trying to simplify it to avoid confusing 
to people, I think the example is wrong. Here's one I was using in test:
{code}
//   guaranteed,  max,used,   pending
"root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
"-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
"-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
"-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
{code}

There're 3 resource types. Total resource of the cluster is 30:18:6
For both of a/b, there're 3 containers running, each of container is 2:2:1
Prior to the attached patch. The preemption cannot happen, you can check 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePreemption
 test for details.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
>  
>  This is an example of the problem: (Same if we have more than 2 resources)
>   
> Let's say we have 3 queues A/B/C. All containers with equal size <2,3>
>  
> ||Queue||Guaranteed||Used ||Pending||
> |A|<20, 10>|<20,30>| |
> |B|<20, 10>|0|0|
> |C|<20, 10>|0|<20, 30>|
> | | | | |
>  
> Under current logic, A's calculated to-preempt (how much resource other queue 
> can preempt) will be <0, 20>. The preemption will not happen. However, under 
> the context of DRC, queue A is using more resource than guaranteed, so queue 
> C will be starved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478026#comment-16478026
 ] 

Wangda Tan commented on YARN-8292:
--

Thanks [~eepayne], 

Let me revise the example a bit, I was trying to simplify it to avoid confusing 
to people, I think the example is wrong. Here's one I was using in test:
{code}
//   guaranteed,  max,used,   pending
"root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
"-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
"-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
"-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
{code}

There're 3 resource types. Total resource of the cluster is 30:18:6
For both of a/b, there're 3 containers running, each of container is 2:2:1
Prior to the attached patch. The preemption cannot happen, you can check 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePreemption
 test for details.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch
>
>
>  
>  This is an example of the problem: (Same if we have more than 2 resources)
>   
> Let's say we have 3 queues A/B/C. All containers with equal size <2,3>
>  
> ||Queue||Guaranteed||Used ||Pending||
> |A|<20, 10>|<20,30>| |
> |B|<20, 10>|0|0|
> |C|<20, 10>|0|<20, 30>|
> | | | | |
>  
> Under current logic, A's calculated to-preempt (how much resource other queue 
> can preempt) will be <0, 20>. The preemption will not happen. However, under 
> the context of DRC, queue A is using more resource than guaranteed, so queue 
> C will be starved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-16 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478004#comment-16478004
 ] 

Miklos Szegedi commented on YARN-4599:
--

[~snemeth], thanks for the review.

oomHandlerTemp is necessary since we may throw an exception after we set it. An 
external code optimized by JVM (setting a variable before the constructor is 
finished) may get a partial copy of this object in case of the exception. It is 
better to set the fields after the exception is not thrown.

3) This code has to be very fast do the same conversion for 1000s of containers 
potentially, so I will keep the plain multiplication instead of calling int 
external code that deals with strings.

 

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.sandflee.patch, 
> yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477999#comment-16477999
 ] 

Wangda Tan edited comment on YARN-8292 at 5/16/18 8:08 PM:
---

[~jlowe], thanks for your review, 

bq. I think the check for a zero resource can be dropped and it simplifies to 
the toObtainAfterPreemption component-wise max'd with zero is less than the 
amount to obtain from the partition (after being max'd with zero). In other 
words, we want to preempt as long as we have some resources we want to obtain 
from the partition and preempting the container makes progress on at least one 
of the resource dimensions being requested from the partition.
The second part is correct {{..at least one of the resource dimensions being 
requested from the partition}}, that's why we added following check: 
{code}
198 doPreempt = doPreempt && (Resources.lessThan(rc, 
clusterResource,
199 Resources
200 .componentwiseMax(toObtainAfterPreemption, 
Resources.none()),
201 Resources.componentwiseMax(toObtainByPartition, 
Resources.none(;
{code}

The check of {{toObtainAfterPreemption}} is to make sure we will not do 
over-preemption. For example, if a queue's res-to-obtain = (3,-1,-1), and a 
container is (4,1,1). Even if preempt the container can make positive 
contribution, we will not do this because after preemption, the queue becomes 
an under-utilized queue and it may preempt resources from other queues.

Following logics are mostly to cover two cases to avoid over-preemption:
{code}
195 doPreempt = Resources.greaterThanOrEqual(rc, clusterResource,
196 toObtainAfterPreemption, Resources.none()) || Resources
197 .isAnyMajorResourceZero(rc, toObtainAfterPreemption);
{code}
a. After preemption, there're some positive major resources.
b. After preemption, there're at least one 0 major resources (which indicates 
that the queue is still satisfied after preemption).

Please let me know if you still have any other questions.


was (Author: leftnoteasy):
[~jlowe], thanks for your review, 

bq. I think the check for a zero resource can be dropped and it simplifies to 
the toObtainAfterPreemption component-wise max'd with zero is less than the 
amount to obtain from the partition (after being max'd with zero). In other 
words, we want to preempt as long as we have some resources we want to obtain 
from the partition and preempting the container makes progress on at least one 
of the resource dimensions being requested from the partition.
The second part is correct, that's why we added following check: 
{code}
198 doPreempt = doPreempt && (Resources.lessThan(rc, 
clusterResource,
199 Resources
200 .componentwiseMax(toObtainAfterPreemption, 
Resources.none()),
201 Resources.componentwiseMax(toObtainByPartition, 
Resources.none(;
{code}

The check of {{toObtainAfterPreemption}} is to make sure we will not do 
over-preemption. For example, if a queue's res-to-obtain = (3,-1,-1), and a 
container is (4,1,1). Even if preempt the container can make positive 
contribution, we will not do this because after preemption, the queue becomes 
an under-utilized queue and it may preempt resources from other queues.

Following logics are mostly to cover two cases to avoid over-preemption:
{code}
195 doPreempt = Resources.greaterThanOrEqual(rc, clusterResource,
196 toObtainAfterPreemption, Resources.none()) || Resources
197 .isAnyMajorResourceZero(rc, toObtainAfterPreemption);
{code}
a. After preemption, there're some positive major resources.
b. After preemption, there're at least one 0 major resources (which indicates 
that the queue is still satisfied after preemption).

Please let me know if you still have any other questions.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch
>
>
>  
>  This is an example of the problem: (Same if we have more than 2 resources)
>   
> Let's say we have 3 queues A/B/C. All containers with equal size <2,3>
>  
> ||Queue||Guaranteed||Used ||Pending||
> |A|<20, 10>|<20,30>| |
> |B|<20, 10>|0|0|
> |C|<20, 10>|0|<20, 30>|
> | | | | |
>  
> Under current logic, A's calculated to-preempt (how much resource other queue 
> can preempt) will be <

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477999#comment-16477999
 ] 

Wangda Tan commented on YARN-8292:
--

[~jlowe], thanks for your review, 

bq. I think the check for a zero resource can be dropped and it simplifies to 
the toObtainAfterPreemption component-wise max'd with zero is less than the 
amount to obtain from the partition (after being max'd with zero). In other 
words, we want to preempt as long as we have some resources we want to obtain 
from the partition and preempting the container makes progress on at least one 
of the resource dimensions being requested from the partition.
The second part is correct, that's why we added following check: 
{code}
198 doPreempt = doPreempt && (Resources.lessThan(rc, 
clusterResource,
199 Resources
200 .componentwiseMax(toObtainAfterPreemption, 
Resources.none()),
201 Resources.componentwiseMax(toObtainByPartition, 
Resources.none(;
{code}

The check of {{toObtainAfterPreemption}} is to make sure we will not do 
over-preemption. For example, if a queue's res-to-obtain = (3,-1,-1), and a 
container is (4,1,1). Even if preempt the container can make positive 
contribution, we will not do this because after preemption, the queue becomes 
an under-utilized queue and it may preempt resources from other queues.

Following logics are mostly to cover two cases to avoid over-preemption:
{code}
195 doPreempt = Resources.greaterThanOrEqual(rc, clusterResource,
196 toObtainAfterPreemption, Resources.none()) || Resources
197 .isAnyMajorResourceZero(rc, toObtainAfterPreemption);
{code}
a. After preemption, there're some positive major resources.
b. After preemption, there're at least one 0 major resources (which indicates 
that the queue is still satisfied after preemption).

Please let me know if you still have any other questions.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch
>
>
>  
>  This is an example of the problem: (Same if we have more than 2 resources)
>   
> Let's say we have 3 queues A/B/C. All containers with equal size <2,3>
>  
> ||Queue||Guaranteed||Used ||Pending||
> |A|<20, 10>|<20,30>| |
> |B|<20, 10>|0|0|
> |C|<20, 10>|0|<20, 30>|
> | | | | |
>  
> Under current logic, A's calculated to-preempt (how much resource other queue 
> can preempt) will be <0, 20>. The preemption will not happen. However, under 
> the context of DRC, queue A is using more resource than guaranteed, so queue 
> C will be starved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478000#comment-16478000
 ] 

Eric Payne commented on YARN-8292:
--

{quote}
||Queue||Guaranteed||Used ||Pending||
|A|<20, 10>|<20,30>| |
|B|<20, 10>|0|0|
|C|<20, 10>|0|<20, 30>|
Under current logic, A's calculated to-preempt (how much resource other queue 
can preempt) will be <0, 20>. The preemption will not happen.
{quote}
I want to challenge the original example. The above does cause preemption. I 
have tested this scenario, and it does preempt.

In my tests, the first resource is memory and the second is vcores. I think the 
reason is that the dominant resource calculator will determine that vcores is a 
higher percentage of the available resources than memory, so vcores is dominant.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch
>
>
>  
>  This is an example of the problem: (Same if we have more than 2 resources)
>   
> Let's say we have 3 queues A/B/C. All containers with equal size <2,3>
>  
> ||Queue||Guaranteed||Used ||Pending||
> |A|<20, 10>|<20,30>| |
> |B|<20, 10>|0|0|
> |C|<20, 10>|0|<20, 30>|
> | | | | |
>  
> Under current logic, A's calculated to-preempt (how much resource other queue 
> can preempt) will be <0, 20>. The preemption will not happen. However, under 
> the context of DRC, queue A is using more resource than guaranteed, so queue 
> C will be starved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain

2018-05-16 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477991#comment-16477991
 ] 

Haibo Chen commented on YARN-7933:
--

Thanks [~rohithsharma] for the patch! I have checked it in trunk.

> [atsv2 read acls] Add TimelineWriter#writeDomain 
> -
>
> Key: YARN-7933
> URL: https://issues.apache.org/jira/browse/YARN-7933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7933.01.patch, YARN-7933.02.patch, 
> YARN-7933.03.patch, YARN-7933.04.patch, YARN-7933.05.patch, YARN-7933.06.patch
>
>
>  
> Add an API TimelineWriter#writeDomain for writing the domain info 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7278) LinuxContainer in docker mode will be failed when nodemanager restart, because timeout for docker is too slow.

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7278:
--
Labels: Docker  (was: )

> LinuxContainer in docker mode will be failed when nodemanager restart, 
> because timeout for docker is too slow.
> --
>
> Key: YARN-7278
> URL: https://issues.apache.org/jira/browse/YARN-7278
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0
> Environment: CentOS
>Reporter: zhengchenyu
>Priority: Major
>  Labels: Docker
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, nodemanagere recovery is turn on, and we use LinuxConainer 
> with docker mode.
> Container may be failed when nodemanager restart, exception is below:
> {code}
> [2017-09-29T15:47:14.433+08:00] [INFO] 
> containermanager.monitor.ContainersMonitorImpl.run(ContainersMonitorImpl.java 
> 472) [Container Monitor] : Memory usage of ProcessTree 120523 for 
> container-id container_1506600355508_0023_01_04: -1B of 10 GB physical 
> memory used; -1B of 31 GB virtual memory used
> [2017-09-29T15:47:15.219+08:00] [ERROR] 
> containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java
>  93) [ContainersLauncher #1] : Unable to recover container 
> container_1506600355508_0023_01_04
> java.io.IOException: Timeout while waiting for exit code from 
> container_1506600355508_0023_01_04
> [2017-09-29T15:47:15.220+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1142) 
> [AsyncDispatcher event handler] : Container 
> container_1506600355508_0023_01_04 transitioned from RUNNING to 
> EXITED_WITH_FAILURE
> [2017-09-29T15:47:15.221+08:00] [INFO] 
> containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java
>  440) [AsyncDispatcher event handler] : Cleaning up container 
> container_1506600355508_0023_01_04
> {code}
> I guess the proccess is done, but 2 seconde later( the variable is msecLeft), 
> the *.pid.exitcode wasn't created. Then I changed variable to 2ms, The 
> container is succeed when nodemanger is restart.
> So I think it is too short for docker container to complete the work.
> In docker mode of LinuxContainer, nm monitor the real task which is launched 
> by "docker run" command. Then "docker wait" command will wait for exitcode, 
> then "docker rm" will delete the docker container. Lastly, container-executor 
> will write the exit code. So if some docker command is slow enough, nm 
> wouldn't monitor the container. In fact, docker rm is always slow. 
> I think the exit code of docker rm dosen't matter with the real task, so I 
> think we could move the operation of write "*.pid.exitcode" before the 
> command of docker rm. Or monitor the docker wait proccess, but not the real 
> task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7246) Fix the default docker binary path

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7246:
--
Labels: Docker  (was: )

> Fix the default docker binary path
> --
>
> Key: YARN-7246
> URL: https://issues.apache.org/jira/browse/YARN-7246
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: Docker
> Fix For: 2.8.2
>
> Attachments: YARN-7246-branch-2.8.2.001.patch, 
> YARN-7246-branch-2.8.2.002.patch, YARN-7246-branch-2.8.2.003.patch, 
> YARN-7246-branch-2.8.2.004.patch, YARN-7246-branch-2.8.2.005.patch, 
> YARN-7246-branch-2.8.2.006.patch, YARN-7246-branch-2.8.2.007.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8141:
--
Labels: Docker  (was: )

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch, YARN-8141.004.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4018:
--
Labels: Docker  (was: )

> correct docker image name is rejected by DockerContainerExecutor
> 
>
> Key: YARN-4018
> URL: https://issues.apache.org/jira/browse/YARN-4018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Major
>  Labels: Docker
> Attachments: YARN-4018.patch
>
>
> For example:
> "www.dockerbase.net/library/mongo"
> "www.dockerbase.net:5000/library/mongo:latest"
> leads to error:
> Image: www.dockerbase.net/library/mongo is not a proper docker image
> Image: www.dockerbase.net:5000/library/mongo:latest is not a proper docker 
> image



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8274) Docker command error during container relaunch

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8274:
--
Labels: Docker  (was: )

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4016) docker container is still running when app is killed

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4016:
--
Labels: Docker  (was: )

> docker container is still running when app is killed
> 
>
> Key: YARN-4016
> URL: https://issues.apache.org/jira/browse/YARN-4016
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Major
>  Labels: Docker
>
> The docker_container_executor_session.sh is generated like below:
> {code}
> ### get the pid of docker container by "docker inspect"
> echo `/usr/bin/docker inspect --format {{.State.Pid}} 
> container_1438681002528_0001_01_02` > 
> .../container_1438681002528_0001_01_02.pid.tmp
> ### rename *.pid.tmp to *.pid
> /bin/mv -f .../container_1438681002528_0001_01_02.pid.tmp 
> .../container_1438681002528_0001_01_02.pid
> ### launch the docker container
> /usr/bin/docker run  --rm  --net=host --name 
> container_1438681002528_0001_01_02 -v ... library/mysql 
> /container_1438681002528_0001_01_02/launch_container.sh" 
> {code}
> This is obviously wrong because you can not get the pid of a docker container 
> before starting it.  When NodeManager try to kill the container, pid zero is 
> always read from the pid file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3201) add args for DistributedShell to specify a image for tasks that will run on docker

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-3201:
--
Labels: Docker  (was: )

> add args for DistributedShell to specify a image for tasks that will run on 
> docker
> --
>
> Key: YARN-3201
> URL: https://issues.apache.org/jira/browse/YARN-3201
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: applications/distributed-shell
>Reporter: zhangwei
>Assignee: yarntime
>Priority: Major
>  Labels: Docker
>
> It's very useful to execute a script on docker to do some test, but the 
> distributedshell has no args to set the image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-2976:
--
Labels: Docker  (was: )

> Invalid docs for specifying 
> yarn.nodemanager.docker-container-executor.exec-name
> 
>
> Key: YARN-2976
> URL: https://issues.apache.org/jira/browse/YARN-2976
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Hitesh Shah
>Assignee: Vijay Bhat
>Priority: Minor
>  Labels: Docker
>
> Docs on 
> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html
>  mention setting "docker -H=tcp://0.0.0.0:4243" for 
> yarn.nodemanager.docker-container-executor.exec-name. 
> However, the actual implementation does a fileExists for the specified value. 
> Either the docs need to be fixed or the impl changed to allow relative paths 
> or commands with additional args



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-3302:
--
Labels: Docker  (was: )

> TestDockerContainerExecutor should run automatically if it can detect docker 
> in the usual place
> ---
>
> Key: YARN-3302
> URL: https://issues.apache.org/jira/browse/YARN-3302
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Ravi Prakash
>Assignee: Ravindra Kumar Naik
>Priority: Major
>  Labels: Docker
> Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, 
> YARN-3302-trunk.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5426) Enable distributed shell to launch docker containers

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5426:
--
Labels: Docker  (was: )

> Enable distributed shell to launch docker containers
> 
>
> Key: YARN-5426
> URL: https://issues.apache.org/jira/browse/YARN-5426
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Priority: Major
>  Labels: Docker
>
> This could be the easiest way to bring up docker containers on YARN. 
> An option like -docker , or with the ability to run different types of 
> images with locality requirement altogether. In short, I think making docker 
> container first-class thing in distributed shell is useful for testing 
> docker-based services on YARN in the long term.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6387) Provide a flag in Rest API GET response to notify if the app launch delay is due to docker image download.

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6387:
--
Labels: Docker  (was: )

> Provide a flag in Rest API GET response to notify if the app launch delay is 
> due to docker image download.
> --
>
> Key: YARN-6387
> URL: https://issues.apache.org/jira/browse/YARN-6387
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: sriharsha devineni
>Priority: Major
>  Labels: Docker
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3289) Docker images should be downloaded during localization

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-3289:
--
Labels: Docker  (was: )

> Docker images should be downloaded during localization
> --
>
> Key: YARN-3289
> URL: https://issues.apache.org/jira/browse/YARN-3289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ravi Prakash
>Priority: Major
>  Labels: Docker
>
> We currently call docker run on images while launching containers. If the 
> image size if sufficiently big, the task will timeout. We should download the 
> image we want to run during localization (if possible) to prevent this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6454) Add support for setting environment variables for docker containers

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6454:
--
Labels: Docker  (was: )

> Add support for setting environment variables for docker containers
> ---
>
> Key: YARN-6454
> URL: https://issues.apache.org/jira/browse/YARN-6454
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Jaeboo Jeong
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6454.001.patch
>
>
> Docker allows to set environment variables for your containers with the -e 
> flag.
> You can set environment variables like below.
> {code}
> YARN_CONTAINER_RUNTIME_DOCKER_ENVIRONMENT_VARIABLES=“HADOOP_CONF_DIR=$HADOOP_CONF_DIR,HADOOP_HDFS_HOME=/opt/hadoop"
> {code}
> If you want to set a list of values, apply YARN-6434 first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7996) Allow user supplied Docker client configurations with YARN native services

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7996:
--
Labels: Docker  (was: )

> Allow user supplied Docker client configurations with YARN native services
> --
>
> Key: YARN-7996
> URL: https://issues.apache.org/jira/browse/YARN-7996
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7996.001.patch, YARN-7996.002.patch, 
> YARN-7996.003.patch, YARN-7996.004.patch, YARN-7996.005.patch, 
> YARN-7996.006.patch
>
>
> YARN-5428 added support to distributed shell for supplying a Docker client 
> configuration at application submission time. The auth tokens within the 
> client configuration are then used to pull images from private Docker 
> repositories/registries. Add the same support to the YARN Native Services 
> framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-1964:
--
Labels: Docker  (was: )

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
>Priority: Major
>  Labels: Docker
> Fix For: 2.6.0
>
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> *This alpha feature has been deprecated in branch-2 and removed from trunk* 
> Please see https://issues.apache.org/jira/browse/YARN-5388
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5505) Create an agent-less docker provider in the native-services framework

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5505:
--
Labels: Docker  (was: )

> Create an agent-less docker provider in the native-services framework
> -
>
> Key: YARN-5505
> URL: https://issues.apache.org/jira/browse/YARN-5505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
>  Labels: Docker
> Fix For: yarn-native-services
>
> Attachments: YARN-5505-yarn-native-services.001.patch, 
> YARN-5505-yarn-native-services.002.patch
>
>
> The Slider AM has a pluggable portion called a provider. Currently the only 
> provider implementation is the agent provider which contains the bulk of the 
> agent-related Java code. We can implement a docker provider that does not use 
> the agent and gets information it needs directly from the NM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3988) DockerContainerExecutor should allow user specify "docker run" parameters

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-3988:
--
Labels: Docker  (was: )

> DockerContainerExecutor should allow user specify "docker run" parameters
> -
>
> Key: YARN-3988
> URL: https://issues.apache.org/jira/browse/YARN-3988
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Chen He
>Assignee: Chen He
>Priority: Major
>  Labels: Docker
>
> In current DockerContainerExecutor, the "docker run" command has fixed 
> parameters:
> String commandStr = commands.append(dockerExecutor)
>   .append(" ")
>   .append("run")
>   .append(" ")
>   .append("--rm --net=host")
>   .append(" ")
>   .append(" --name " + containerIdStr)
>   .append(localDirMount)
>   .append(logDirMount)
>   .append(containerWorkDirMount)
>   .append(" ")
>   .append(containerImageName)
>   .toString();
> For example, it is not flexible if users want to start a docker container 
> with attaching extra volume(s) and other "docker run" parameters. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8181) Docker container run_time

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8181:
--
Labels: Docker  (was: )

> Docker container run_time
> -
>
> Key: YARN-8181
> URL: https://issues.apache.org/jira/browse/YARN-8181
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Seyyed Ahmad Javadi
>Priority: Major
>  Labels: Docker
>
> Hi All,
> I want to use docker container run time but could not solve the facing 
> problem. I am following the guide below and the NM log is as follows. I can 
> not see any docker containers to be created. It works when I use default LCE. 
> Please also find how I submit a job at the end as well.
> Do you have any guide on how can I make Docker rum_time works?
> May you please let me know how can use LCE binary to make sure my docker 
> setup is correct?
> I confirmed that "docker run" works fine. I really like this developing 
> feature and would like to contribute to it. Many thanks in advance.
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/DockerContainers.html]
> {code:java}
> NM LOG:
> ...
> 2018-04-19 11:49:24,568 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth successful for appattempt_1524151293356_0005_01 (auth:SIMPLE)
> 2018-04-19 11:49:24,580 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Start request for container_1524151293356_0005_01_01 by user ubuntu
> 2018-04-19 11:49:24,584 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Creating a new application reference for app application_1524151293356_0005
> 2018-04-19 11:49:24,584 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=ubuntu    
> IP=130.245.127.176    OPERATION=Start Container Request    
> TARGET=ContainerManageImpl    RESULT=SUCCESS    
> APPID=application_1524151293356_0005    
> CONTAINERID=container_1524151293356_0005_01_01
> 2018-04-19 11:49:24,585 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1524151293356_0005 transitioned from NEW to INITING
> 2018-04-19 11:49:24,585 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Adding container_1524151293356_0005_01_01 to application 
> application_1524151293356_0005
> 2018-04-19 11:49:24,585 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1524151293356_0005 transitioned from INITING to 
> RUNNING
> 2018-04-19 11:49:24,588 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1524151293356_0005_01_01 transitioned from NEW to 
> LOCALIZING
> 2018-04-19 11:49:24,588 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event CONTAINER_INIT for appId application_1524151293356_0005
> 2018-04-19 11:49:24,589 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Created localizer for container_1524151293356_0005_01_01
> 2018-04-19 11:49:24,616 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Writing credentials to the nmPrivate file 
> /tmp/hadoop-ubuntu/nm-local-dir/nmPrivate/container_1524151293356_0005_01_01.tokens
> 2018-04-19 11:49:28,090 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1524151293356_0005_01_01 transitioned from 
> LOCALIZING to SCHEDULED
> 2018-04-19 11:49:28,090 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Starting container [container_1524151293356_0005_01_01]
> 2018-04-19 11:49:28,212 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1524151293356_0005_01_01 transitioned from SCHEDULED 
> to RUNNING
> 2018-04-19 11:49:28,212 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Starting resource-monitoring for container_1524151293356_0005_01_01
> 2018-04-19 11:49:29,401 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_1524151293356_0005_01_01 succeeded
> 2018-04-19 11:49:29,401 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1524151293356_0005_01_01 transitioned from RUNNING 
> to EXITED_WITH_SUCCESS
> 2018-04-19 11:49:29,401 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1524151293356_0005_01_000

[jira] [Updated] (YARN-6160) Create an agent-less docker-less provider in the native services framework

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6160:
--
Labels: Docker  (was: )

> Create an agent-less docker-less provider in the native services framework
> --
>
> Key: YARN-6160
> URL: https://issues.apache.org/jira/browse/YARN-6160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
>  Labels: Docker
> Fix For: yarn-native-services
>
> Attachments: YARN-6160-yarn-native-services.001.patch, 
> YARN-6160-yarn-native-services.002.patch, 
> YARN-6160-yarn-native-services.003.patch
>
>
> The goal of the agent-less docker-less provider is to be able to use the YARN 
> native services framework when Docker is not installed or other methods of 
> app resource installation are preferable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8231) Dshell application fails when one of the docker container gets killed

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8231:
--
Labels: Docker  (was: )

> Dshell application fails when one of the docker container gets killed
> -
>
> Key: YARN-8231
> URL: https://issues.apache.org/jira/browse/YARN-8231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Priority: Critical
>  Labels: Docker
>
> 1) Launch dshell application
> {code}
> yarn  jar hadoop-yarn-applications-distributedshell-*.jar  -shell_command 
> "sleep 300" -num_containers 2 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker 
> -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest 
> -keep_containers_across_application_attempts -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> 2) Kill container_1524681858728_0012_01_02
> Expected behavior:
> Application should start new instance and finish successfully
> Actual behavior:
> Application Failed as soon as container was killed
> {code:title=AM log}
> 18/04/27 23:05:12 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, completedCnt=1
> 18/04/27 23:05:12 INFO distributedshell.ApplicationMaster: 
> appattempt_1524681858728_0012_01 got container status for 
> containerID=container_1524681858728_0012_01_02, state=COMPLETE, 
> exitStatus=137, diagnostics=[2018-04-27 23:05:09.310]Container killed on 
> request. Exit code is 137
> [2018-04-27 23:05:09.331]Container exited with a non-zero exit code 137. 
> [2018-04-27 23:05:09.332]Killed by external signal
> 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Got response from 
> RM for container ask, completedCnt=1
> 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: 
> appattempt_1524681858728_0012_01 got container status for 
> containerID=container_1524681858728_0012_01_03, state=COMPLETE, 
> exitStatus=0, diagnostics=
> 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Container 
> completed successfully., containerId=container_1524681858728_0012_01_03
> 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application 
> completed. Stopping running containers
> 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application 
> completed. Signalling finish to RM
> 18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Diagnostics., 
> total=2, completed=2, allocated=2, failed=1
> 18/04/27 23:08:46 INFO impl.AMRMClientImpl: Waiting for application to be 
> successfully unregistered.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7416) Use "docker volume inspect" to make sure that volumes for GPU drivers/libs are properly mounted.

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7416:
--
Labels: Docker  (was: )

> Use "docker volume inspect" to make sure that volumes for GPU drivers/libs 
> are properly mounted. 
> -
>
> Key: YARN-7416
> URL: https://issues.apache.org/jira/browse/YARN-7416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>  Labels: Docker
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6804) Allow custom hostname for docker containers in native services

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6804:
--
Labels: Docker  (was: )

> Allow custom hostname for docker containers in native services
> --
>
> Key: YARN-6804
> URL: https://issues.apache.org/jira/browse/YARN-6804
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, yarn-native-services, 3.0.0-beta1
>
> Attachments: YARN-6804-branch-2.8.01.patch, 
> YARN-6804-trunk.004.patch, YARN-6804-trunk.005.patch, 
> YARN-6804-yarn-native-services.001.patch, 
> YARN-6804-yarn-native-services.002.patch, 
> YARN-6804-yarn-native-services.003.patch, 
> YARN-6804-yarn-native-services.004.patch, 
> YARN-6804-yarn-native-services.005.patch
>
>
> Instead of the default random docker container hostname, we could set a more 
> user-friendly hostname for the container. The default could be a hostname 
> based on the container ID, with an option for the AM to provide a different 
> hostname. In the case of the native services AM, we could provide the 
> hostname that would be created by the registry DNS server. Regardless of 
> whether or not registry DNS is enabled, this would be a more useful hostname 
> for the docker container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2466) Umbrella issue for Yarn launched Docker Containers

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-2466:
--
Labels: Docker  (was: )

> Umbrella issue for Yarn launched Docker Containers
> --
>
> Key: YARN-2466
> URL: https://issues.apache.org/jira/browse/YARN-2466
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.4.1
>Reporter: Abin Shahab
>Priority: Major
>  Labels: Docker
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to package their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).
> In addition to software isolation mentioned above, Docker containers will 
> provide resource, network, and user-namespace isolation. 
> Docker provides resource isolation through cgroups, similar to 
> LinuxContainerExecutor. This prevents one job from taking other jobs 
> resource(memory and CPU) on the same hadoop cluster. 
> User-namespace isolation will ensure that the root on the container is mapped 
> an unprivileged user on the host. This is currently being added to Docker.
> Network isolation will ensure that one user’s network traffic is completely 
> isolated from another user’s network traffic. 
> Last but not the least, the interaction of Docker and Kerberos will have to 
> be worked out. These Docker containers must work in a secure hadoop 
> environment.
> Additional details are here: 
> https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8208) Add log statement for Docker client configuration file at INFO level

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8208:
--
Labels: Docker  (was: )

> Add log statement for Docker client configuration file at INFO level
> 
>
> Key: YARN-8208
> URL: https://issues.apache.org/jira/browse/YARN-8208
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Minor
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8208.001.patch, YARN-8208.002.patch, 
> YARN-8208.003.patch
>
>
> log statement to indicate source of  Docker client configuration file should 
> be moved to INFO level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6622) Document Docker work as experimental

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6622:
--
Labels: Docker  (was: )

> Document Docker work as experimental
> 
>
> Key: YARN-6622
> URL: https://issues.apache.org/jira/browse/YARN-6622
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6622.001.patch
>
>
> We should update the Docker support documentation calling out the Docker work 
> as experimental.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-3324:
--
Labels: BB2015-05-TBR Docker  (was: BB2015-05-TBR)

> TestDockerContainerExecutor should clean test docker image from local 
> repository after test is done
> ---
>
> Key: YARN-3324
> URL: https://issues.apache.org/jira/browse/YARN-3324
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0
>Reporter: Chen He
>Priority: Major
>  Labels: BB2015-05-TBR, Docker
> Attachments: YARN-3324-branch-2.6.0.002.patch, 
> YARN-3324-trunk.002.patch
>
>
> Current TestDockerContainerExecutor only cleans the temp directory in local 
> file system but leaves the test docker image in local docker repository. It 
> should be cleaned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6091:
--
Labels: Docker  (was: )

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7361) Improve the docker container runtime documentation

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7361:
--
Labels: Docker  (was: )

> Improve the docker container runtime documentation
> --
>
> Key: YARN-7361
> URL: https://issues.apache.org/jira/browse/YARN-7361
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 2.8.3, 3.1.0, 2.9.1, 3.0.1
>
> Attachments: YARN-7361.001.patch, YARN-7361.002.patch
>
>
> During review of YARN-7230, it was found that 
> yarn.nodemanager.runtime.linux.docker.capabilities is missing from the docker 
> containers documentation in most of the active branches. We can also improve 
> the warning that was introduced in YARN-6622.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8285) Remove unused environment variables from the Docker runtime

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8285:
--
Labels: Docker  (was: )

> Remove unused environment variables from the Docker runtime
> ---
>
> Key: YARN-8285
> URL: https://issues.apache.org/jira/browse/YARN-8285
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Eric Badger
>Priority: Trivial
>  Labels: Docker
> Attachments: YARN-8285.001.patch
>
>
> YARN-7430 enabled user remapping for Docker containers by default. As a 
> result, YARN_CONTAINER_RUNTIME_DOCKER_RUN_ENABLE_USER_REMAPPING is no longer 
> used and can be removed.
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE_FILE was added in the original 
> implementation, but was never used and can be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7224) Support GPU isolation for docker container

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7224:
--
Labels: Docker  (was: )

> Support GPU isolation for docker container
> --
>
> Key: YARN-7224
> URL: https://issues.apache.org/jira/browse/YARN-7224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7224.001.patch, YARN-7224.002-wip.patch, 
> YARN-7224.003.patch, YARN-7224.004.patch, YARN-7224.005.patch, 
> YARN-7224.006.patch, YARN-7224.007.patch, YARN-7224.008.patch, 
> YARN-7224.009.patch
>
>
> This patch is to address issues when docker container is being used:
> 1. GPU driver and nvidia libraries: If GPU drivers and NV libraries are 
> pre-packaged inside docker image, it could conflict to driver and 
> nvidia-libraries installed on Host OS. An alternative solution is to detect 
> Host OS's installed drivers and devices, mount it when launch docker 
> container. Please refer to \[1\] for more details. 
> 2. Image detection: 
> From \[2\], the challenge is: 
> bq. Mounting user-level driver libraries and device files clobbers the 
> environment of the container, it should be done only when the container is 
> running a GPU application. The challenge here is to determine if a given 
> image will be using the GPU or not. We should also prevent launching 
> containers based on a Docker image that is incompatible with the host NVIDIA 
> driver version, you can find more details on this wiki page.
> 3. GPU isolation.
> *Proposed solution*:
> a. Use nvidia-docker-plugin \[3\] to address issue #1, this is the same 
> solution used by K8S \[4\]. issue #2 could be addressed in a separate JIRA.
> We won't ship nvidia-docker-plugin with out releases and we require cluster 
> admin to preinstall nvidia-docker-plugin to use GPU+docker support on YARN. 
> "nvidia-docker" is a wrapper of docker binary which can address #3 as well, 
> however "nvidia-docker" doesn't provide same semantics of docker, and it 
> needs to setup additional environments such as PATH/LD_LIBRARY_PATH to use 
> it. To avoid introducing additional issues, we plan to use 
> nvidia-docker-plugin + docker binary approach.
> b. To address GPU driver and nvidia libraries, we uses nvidia-docker-plugin 
> \[3\] to create a volume which includes GPU-related libraries and mount it 
> when docker container being launched. Changes include: 
> - Instead of using {{volume-driver}}, this patch added {{docker volume 
> create}} command to c-e and NM Java side. The reason is {{volume-driver}} can 
> only use single volume driver for each launched docker container.
> - Updated {{c-e}} and Java side, if a mounted volume is a named volume in 
> docker, skip checking file existence. (Named-volume still need to be added to 
> permitted list of container-executor.cfg).
> c. To address isolation issue:
> We found that, cgroup + docker doesn't work under newer docker version which 
> uses {{runc}} as default runtime. Setting {{--cgroup-parent}} to a cgroup 
> which include any {{devices.deny}} causes docker container cannot be launched.
> Instead this patch passes allowed GPU devices via {{--device}} to docker 
> launch command.
> References:
> \[1\] https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver
> \[2\] https://github.com/NVIDIA/nvidia-docker/wiki/Image-inspection
> \[3\] https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin
> \[4\] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7811) Service AM should use configured default docker network

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7811:
--
Labels: Docker  (was: )

> Service AM should use configured default docker network
> ---
>
> Key: YARN-7811
> URL: https://issues.apache.org/jira/browse/YARN-7811
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7811.01.patch
>
>
> Currently the DockerProviderService used by the Service AM hardcodes a 
> default of bridge for the docker network. We already have a YARN 
> configuration property for default network, so the Service AM should honor 
> that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5879) Correctly handle docker.image and launch command when unique_component_support is specified

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5879:
--
Labels: Docker  (was: )

> Correctly handle docker.image and launch command when 
> unique_component_support is specified
> ---
>
> Key: YARN-5879
> URL: https://issues.apache.org/jira/browse/YARN-5879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Priority: Major
>  Labels: Docker
> Attachments: YARN-5879-yarn-native-services.poc.1.patch
>
>
> Found two issues, when {{unique_component_support}} specified to true, server 
> will return error message like:
> {code}
> {
>   "diagnostics": "Property docker.image not specified for 
> {component-name}-{component-id}"
> }
> {code}
> And in addition, the launch command cannot handle patterns like 
> {{COMPONENT_ID}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8287) Update documentation and yarn-default related to the Docker runtime

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8287:
--
Labels: Docker  (was: )

> Update documentation and yarn-default related to the Docker runtime
> ---
>
> Key: YARN-8287
> URL: https://issues.apache.org/jira/browse/YARN-8287
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Priority: Minor
>  Labels: Docker
>
> There are a few typos and omissions in the documentation and yarn-default wrt 
> running Docker containers on YARN. Below is what I noticed, but a more 
> thorough review is still needed:
>  * docker.allowed.volume-drivers is not documented
>  * None of the GPU or FPGA related items are in the Docker docs.
>  * "To run without any capabilites," - typo in yarn-default.xml
>  * remove    from yarn-default.xml
>  * yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed missing from 
> docs
>  * yarn.nodemanager.runtime.linux.docker.stop.grace-period missing from docs
>  * The user remapping features are missing from the docs, we should 
> explicitly call this out.
>  * The privileged container section could use a bit of rework to outline the 
> risks of the feature.
>  * Is it time to remove the security warnings? The community has made many 
> improvements since that warning was added. 
>  * "path within the contatiner" typo



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8160:
--
Labels: Docker  (was: )

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8284) get_docker_command refactoring

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8284:
--
Labels: Docker  (was: )

> get_docker_command refactoring
> --
>
> Key: YARN-8284
> URL: https://issues.apache.org/jira/browse/YARN-8284
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Jason Lowe
>Assignee: Eric Badger
>Priority: Minor
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8284.001.patch, YARN-8284.002.patch
>
>
> YARN-8274 occurred because get_docker_command's helper functions each have to 
> remember to put the docker binary as the first argument.  This is error prone 
> and causes code duplication for each of the helper functions.  It would be 
> safer and simpler if get_docker_command initialized the docker binary 
> argument in one place and each of the helper functions only added the 
> arguments specific to their particular docker sub-command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7878) Docker container IP detail missing when service is in STABLE state

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7878:
--
Labels: Docker  (was: )

> Docker container IP detail missing when service is in STABLE state
> --
>
> Key: YARN-7878
> URL: https://issues.apache.org/jira/browse/YARN-7878
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Priority: Critical
>  Labels: Docker
>
> Scenario
>  1) Launch Hbase on docker app
>  2) Validate yarn service status using cli
> {code:java}
> {"name":"hbase-app-with-docker","id":"application_1517516543573_0012","artifact":{"id":"hbase-centos","type":"DOCKER"},"lifetime":3519,"components":[{"name":"hbasemaster","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_MASTER_OPTS":"-Xmx2048m
>  
> -Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_02","ip":"10.0.0.9","hostname":"hbasemaster-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029963,"bare_host":"xxx","component_name":"hbasemaster-0"}],"launch_command":"sleep
>  15; /usr/hdp/current/hbase-master/bin/hbase master 
> start","number_of_containers":1,"run_privileged_container":false},{"name":"regionserver","dependencies":["hbasemaster"],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"2048"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_REGIONSERVER_OPTS":"-XX:CMSInitiatingOccupancyFraction=70
>  -Xmx2048m 
> -Xms1024m","HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.regionserver.hostname":"${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_05","state":"READY","launch_time":1517533059022,"bare_host":"xxx","component_name":"regionserver-0"}],"launch_command":"sleep
>  15; /usr/hdp/current/hbase-regionserver/bin/hbase regionserver 
> start","number_of_containers":1,"run_privileged_container":false},{"name":"hbaseclient","dependencies":[],"artifact":{"id":"hbase-centos","type":"DOCKER"},"resource":{"cpus":1,"memory":"1024"},"state":"STABLE","configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_LOG_DIR":""},"files":[{"type":"XML","properties":{"hbase.zookeeper.quorum":"${CLUSTER_ZK_QUORUM}","zookeeper.znode.parent":"${SERVICE_ZK_PATH}","hbase.rootdir":"${SERVICE_HDFS_DIR}/hbase","hbase.master.hostname":"hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}","hbase.master.info.port":"16010","hbase.cluster.distributed":"true"},"dest_file":"/etc/hbase/conf/hbase-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hdfs-site.xml"}]},"quicklinks":[],"containers":[{"id":"container_e02_1517516543573_0012_01_03","ip":"10.0.0.8","hostname":"hbaseclient-0.hbase-app-with-docker.hrt-qa.test.com","state":"READY","launch_time":1517533029964,"bare_host":"xxx","component_name":"hbaseclient-0"}],"launch_command":"sleep
>  
> infinity","number_of_containers":1,"run_privileged_container":false}],"configuration":{"properties":{"docker.network":"hadoop"},"env":{"HBASE_LOG_DIR":""},"files":[{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/core-site.xml","src_file":"core-site.xml"},{"type":"TEMPLATE","properties":{},"dest_file":"/etc/hadoop/conf/hdfs-site.xml","src_file":"hd

[jira] [Updated] (YARN-7805) Yarn should update container as failed on docker container failure

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7805:
--
Labels: Docker  (was: )

> Yarn should update container as failed on docker container failure
> --
>
> Key: YARN-7805
> URL: https://issues.apache.org/jira/browse/YARN-7805
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Priority: Major
>  Labels: Docker
>
> Steps:
> Start hbase yarn service example on docker
> when Hbase master fails, it lead the master daemon docker to fail.
> {code}
> [root@xx bin]# docker ps -a
> CONTAINER IDIMAGE 
> COMMAND  CREATED STATUS   
>   PORTS   NAMES
> a57303b1a736x/xxxhbase:x.x.x.x.0.0.0   "bash /grid/0/hadoop/"   5 
> minutes ago   Exited (1) 4 minutes ago   
> container_e07_1516734339938_0018_01_02
> [root@xxx bin]# docker exec -it a57303b1a736 bash
> Error response from daemon: Container 
> a57303b1a7364a733428ec76581368253e5a701560a510204b8c302e3bbeed26 is not 
> running
> {code}
> Expected behavior:
> Yarn should mark this container as failed and start new docker container
> Actual behavior:
> Yarn did not capture that container is failed. It kept showing container 
> status as Running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7412) test_docker_util.test_check_mount_permitted() is failing

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7412:
--
Labels: Docker  (was: )

> test_docker_util.test_check_mount_permitted() is failing
> 
>
> Key: YARN-7412
> URL: https://issues.apache.org/jira/browse/YARN-7412
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Eric Badger
>Priority: Critical
>  Labels: Docker
> Fix For: 3.0.0
>
> Attachments: YARN-7412.001.patch
>
>
> Test output
>  classname="TestDockerUtil">
>message="/home/haibochen/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc:444

>   Expected: itr->second
  Which is: 1
To be equal to: 
> ret
  Which is: 0
for inp
> ut /usr/bin/touch" 
> type="">
> 
>  classname="TestDockerUtil">
>message="/home/haibochen/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc:462

>   Expected: expected[i]
  Which is: 
> "/usr/bin/touch"
To be equal to: ptr[i]

>  Which is: "/bin/touch"" 
> type="">
> 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7717) Add configuration consistency for module.enabled and docker.privileged-containers.enabled

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7717:
--
Labels: Docker  (was: )

> Add configuration consistency for module.enabled and 
> docker.privileged-containers.enabled
> -
>
> Key: YARN-7717
> URL: https://issues.apache.org/jira/browse/YARN-7717
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Yesha Vora
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7717.001.patch, YARN-7717.002.patch, 
> YARN-7717.003.patch, YARN-7717.004.patch
>
>
> container-executor.cfg has two properties related to dockerization. 
> 1)  module.enabled = true/false
> 2) docker.privileged-containers.enabled = 1/0
> Here, both property takes different value to enable / disable feature. Module 
> enabled take true/false string while docker.privileged-containers.enabled  
> takes 1/0 integer value. 
> This properties behavior should be consistent. Both properties should have 
> true or false string as value to enable or disable feature/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-2981:
--
Labels: Docker oct16-easy  (was: oct16-easy)

> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
>Priority: Major
>  Labels: Docker, oct16-easy
> Attachments: YARN-2981.patch, YARN-2981.patch, YARN-2981.patch, 
> YARN-2981.patch
>
>
> This allows the yarn administrator to add a cluster-wide default docker image 
> that will be used when there are no per-job override of docker images. With 
> this features, it would be convenient for newer applications like slider to 
> launch inside a cluster-default docker container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7677) Docker image cannot set HADOOP_CONF_DIR

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7677:
--
Labels: Docker  (was: )

> Docker image cannot set HADOOP_CONF_DIR
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Eric Badger
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7677.001.patch, YARN-7677.002.patch, 
> YARN-7677.003.patch, YARN-7677.004.patch, YARN-7677.005.patch, 
> YARN-7677.006.patch, YARN-7677.007.patch
>
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5689) Update native services REST API to use agentless docker provider

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5689:
--
Labels: Docker  (was: )

> Update native services REST API to use agentless docker provider
> 
>
> Key: YARN-5689
> URL: https://issues.apache.org/jira/browse/YARN-5689
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Major
>  Labels: Docker
> Fix For: yarn-native-services
>
> Attachments: YARN-5689-yarn-native-services.001.patch, 
> YARN-5689-yarn-native-services.002.patch
>
>
> The initial version of the native services REST API uses the agent provider. 
> It should be converted to use the new docker provider instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8265:
--
Labels: Docker  (was: )

> Service AM should retrieve new IP for docker container relaunched by NM
> ---
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch, 
> YARN-8265.003.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8284) get_docker_command refactoring

2018-05-16 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8284:
--
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-3611

> get_docker_command refactoring
> --
>
> Key: YARN-8284
> URL: https://issues.apache.org/jira/browse/YARN-8284
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Jason Lowe
>Assignee: Eric Badger
>Priority: Minor
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8284.001.patch, YARN-8284.002.patch
>
>
> YARN-8274 occurred because get_docker_command's helper functions each have to 
> remember to put the docker binary as the first argument.  This is error prone 
> and causes code duplication for each of the helper functions.  It would be 
> safer and simpler if get_docker_command initialized the docker binary 
> argument in one place and each of the helper functions only added the 
> arguments specific to their particular docker sub-command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477941#comment-16477941
 ] 

Jason Lowe commented on YARN-8292:
--

I'm still confused about the Resources.isAnyMajorResourceZero(rc, 
toObtainAfterPreemption) clause in the doPreempt conditional.  If we add a 
rarely-requested resource dimension, it is likely to be often zero in a queue's 
usage and therefore zero in toObtainAfterPremption.  
Resources.isAnyMajorResourceZero(rc, toObtainAfterPreemption) will then be 
always true, and that seems irrelevant to whether we want to keep preempting or 
not.

If I understand the proposal correctly, I think the check for a zero resource 
can be dropped and it simplifies to the toObtainAfterPreemption component-wise 
max'd with zero is less than the amount to obtain from the partition (after 
being max'd with zero).  In other words, we want to preempt as long as we have 
some resources we want to obtain from the partition and preempting the 
container makes progress on at least one of the resource dimensions being 
requested from the partition.


> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch
>
>
>  
>  This is an example of the problem: (Same if we have more than 2 resources)
>   
> Let's say we have 3 queues A/B/C. All containers with equal size <2,3>
>  
> ||Queue||Guaranteed||Used ||Pending||
> |A|<20, 10>|<20,30>| |
> |B|<20, 10>|0|0|
> |C|<20, 10>|0|<20, 30>|
> | | | | |
>  
> Under current logic, A's calculated to-preempt (how much resource other queue 
> can preempt) will be <0, 20>. The preemption will not happen. However, under 
> the context of DRC, queue A is using more resource than guaranteed, so queue 
> C will be starved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8071) Add ability to specify nodemanager environment variables individually

2018-05-16 Thread Jim Brennan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477936#comment-16477936
 ] 

Jim Brennan commented on YARN-8071:
---

[~jlowe], thanks for the review:

 
{quote}The changes to TestContainerLaunch#testPrependDistcache appear to be 
unnecessary?
{quote}
They were intentional. When I was testing my new test case, I realized that 
passing the empty set for the {{nmVars}} argument leads to exceptions in 
{{addToEnvMap()}}, so I fixed the testPrependDistcache() cases as well - I 
assume this windows-only test must be failing without this fix.
 

> Add ability to specify nodemanager environment variables individually
> -
>
> Key: YARN-8071
> URL: https://issues.apache.org/jira/browse/YARN-8071
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-8071.001.patch, YARN-8071.002.patch, 
> YARN-8071.003.patch
>
>
> YARN-6830 describes a problem where environment variables that contain commas 
> cannot be specified via {{-Dmapreduce.map.env}}.
> For example:
> {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}}
> will set {{MOUNTS}} to {{/tmp/foo}}
> In that Jira, [~aw] suggested that we change the API to provide a way to 
> specify environment variables individually, the same way that Spark does.
> {quote}Rather than fight with a regex why not redefine the API instead?
>  
> -Dmapreduce.map.env.MODE=bar
>  -Dmapreduce.map.env.IMAGE_NAME=foo
>  -Dmapreduce.map.env.MOUNTS=/tmp/foo,/tmp/bar
> ...
> e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> This greatly simplifies the input validation needed and makes it clear what 
> is actually being defined.
> {quote}
> The mapreduce properties were dealt with in [MAPREDUCE-7069].  This Jira will 
> address the YARN properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-16 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477933#comment-16477933
 ] 

genericqa commented on YARN-4599:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
21s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  5m 
28s{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 26m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
43s{color} | {color:green} root: The patch generated 0 new + 235 unchanged - 1 
fixed = 235 total (was 236) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 21s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 48s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}191m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.util.TestDiskChecker |
|   | hadoop.util.TestReadWriteDiskValidator |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-4599 |
| JIRA Patch URL | 
https:/

[jira] [Commented] (YARN-8123) Skip compiling old hamlet package when the Java version is 10 or upper

2018-05-16 Thread Dinesh Chitlangia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477930#comment-16477930
 ] 

Dinesh Chitlangia commented on YARN-8123:
-

[~tasanuma0829] - Thank you for reviewing the patch

[~ajisakaa] - Thank you for committing the changes.

> Skip compiling old hamlet package when the Java version is 10 or upper
> --
>
> Key: YARN-8123
> URL: https://issues.apache.org/jira/browse/YARN-8123
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
> Environment: Java 10 or upper
>Reporter: Akira Ajisaka
>Assignee: Dinesh Chitlangia
>Priority: Major
>  Labels: newbie
> Fix For: 3.2.0
>
> Attachments: YARN-8123.001.patch
>
>
> HADOOP-11423 skipped compiling old hamlet package when the Java version is 9, 
> however, it is not skipped with Java 10+. We need to fix it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >