date:20180516


 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: YARN-4599.015.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, 
> YARN-4599.013.patch, YARN-4599.014.patch, YARN-4599.015.patch, 
> YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-8302) ATS v2 should handle HBase connection issue properly

2018-05-16 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-8302.
-
Resolution: Won't Fix

Closing the JIRA as won't fix since it is related to configuration issue. 
Decreasing hbase client timeout can be tuned with above commented 
configurations. 

> ATS v2 should handle HBase connection issue properly
> 
>
> Key: YARN-8302
> URL: https://issues.apache.org/jira/browse/YARN-8302
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Priority: Major
>
> ATS v2 call times out with below error when it can't connect to HBase 
> instance.
> {code}
> bash-4.2$ curl -i -k -s -1  -H 'Content-Type: application/json'  -H 'Accept: 
> application/json' --max-time 5   --negotiate -u : 
> 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092'
> curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
> {code}
> {code:title=ATS log}
> 2018-05-15 23:10:03,623 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, 
> retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:13,651 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, 
> retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:23,730 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, 
> retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:33,788 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, 
> retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1{code}
> There are two issues here.
> 1) Check why ATS can't connect to HBase
> 2) In case of connection error,  ATS call should not get timeout. It should 
> fail with proper error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8302) ATS v2 should handle HBase connection issue properly

2018-05-16 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478532#comment-16478532
 ] 

Rohith Sharma K S commented on YARN-8302:
-

If HBase is down for any reasons, hbase client will retry for 20 minutes with 
default configurations loaded. Reducing the default value for 
*hbase.client.retries.number* from 15 to 7, decreased drastically from 20 
minutes to 1.5 minutes.

> ATS v2 should handle HBase connection issue properly
> 
>
> Key: YARN-8302
> URL: https://issues.apache.org/jira/browse/YARN-8302
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Priority: Major
>
> ATS v2 call times out with below error when it can't connect to HBase 
> instance.
> {code}
> bash-4.2$ curl -i -k -s -1  -H 'Content-Type: application/json'  -H 'Accept: 
> application/json' --max-time 5   --negotiate -u : 
> 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092'
> curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
> {code}
> {code:title=ATS log}
> 2018-05-15 23:10:03,623 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, 
> retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:13,651 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, 
> retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:23,730 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, 
> retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1
> 2018-05-15 23:10:33,788 INFO  client.RpcRetryingCallerImpl 
> (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, 
> retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 
> failed on connection exception: 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: xxx/xxx:17020, details=row 
> 'prod.timelineservice.app_flow,
> ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
> hostname=xxx,17020,1526348294182, seqNum=-1{code}
> There are two issues here.
> 1) Check why ATS can't connect to HBase
> 2) In case of connection error,  ATS call should not get timeout. It should 
> fail with proper error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups


[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478490#comment-16478490
 ] 

genericqa commented on YARN-4599:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
33m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
25s{color} | {color:green} root: The patch generated 0 new + 235 unchanged - 1 
fixed = 235 total (was 236) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}174m 38s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}377m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestPread |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-4599 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923724/YAR

[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups


 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: YARN-4599.014.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, 
> YARN-4599.013.patch, YARN-4599.014.patch, YARN-4599.sandflee.patch, 
> yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups


 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: YARN-4599.013.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, 
> YARN-4599.013.patch, YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8234) Improve RM system metrics publisher's performance by pushing events to timeline server in batch

2018-05-16 Thread Hu Ziqian (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478467#comment-16478467
 ] 

Hu Ziqian commented on YARN-8234:
-

Hi  [~leftnoteasy], 

hadoop.yarn.server.resourcemanager.TestClientRMTokens   
hadoop.yarn.server.resourcemanager.TestAMAuthorization   
hadoop.yarn.server.resourcemanager.scheduler.fair.TestSchedulingPolicy

these test can passed in my laptop, i have no idea why it failed in genericqa. 
And i checked these tests, none of them using SystemMetricsPublisher, which 
changed in my patch. 

Other tests and  findbugs are all fixed.

> Improve RM system metrics publisher's performance by pushing events to 
> timeline server in batch
> ---
>
> Key: YARN-8234
> URL: https://issues.apache.org/jira/browse/YARN-8234
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.8.3
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Major
> Attachments: YARN-8234-branch-2.8.3.001.patch, 
> YARN-8234-branch-2.8.3.002.patch, YARN-8234-branch-2.8.3.003.patch
>
>
> When system metrics publisher is enabled, RM will push events to timeline 
> server via restful api. If the cluster load is heavy, many events are sent to 
> timeline server and the timeline server's event handler thread locked. 
> YARN-7266 talked about the detail of this problem. Because of the lock, 
> timeline server can't receive event as fast as it generated in RM and lots of 
> timeline event stays in RM's memory. Finally, those events will consume all 
> RM's memory and RM will start a full gc (which cause an JVM stop-world and 
> cause a timeout from rm to zookeeper) or even get an OOM. 
> The main problem here is that timeline can't receive timeline server's event 
> as fast as it generated. Now, RM system metrics publisher put only one event 
> in a request, and most time costs on handling http header or some thing about 
> the net connection on timeline side. Only few time is spent on dealing with 
> the timeline event which is truly valuable.
> In this issue, we add a buffer in system metrics publisher and let publisher 
> send events to timeline server in batch via one request. When sets the batch 
> size to 1000, in out experiment the speed of the timeline server receives 
> events has 100x improvement. We have implement this function int our product 
> environment which accepts 2 app's in one hour and it works fine.
> We add following configuration:
>  * yarn.resourcemanager.system-metrics-publisher.batch-size: the size of 
> system metrics publisher sending events in one request. Default value is 1000
>  * yarn.resourcemanager.system-metrics-publisher.buffer-size: the size of the 
> event buffer in system metrics publisher.
>  * yarn.resourcemanager.system-metrics-publisher.interval-seconds: When 
> enable batch publishing, we must avoid that the publisher waits for a batch 
> to be filled up and hold events in buffer for long time. So we add another 
> thread which send event's in the buffer periodically. This config sets the 
> interval of the cyclical sending thread. The default value is 60s.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups


[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478460#comment-16478460
 ] 

genericqa commented on YARN-4599:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
30m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
1s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 26m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 
16s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 35s{color} | {color:orange} root: The patch generated 1 new + 235 unchanged 
- 1 fixed = 236 total (was 236) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 17m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 36s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m  9s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}196m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.util.TestBasicDiskValidator |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-4599 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attach

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478425#comment-16478425
 ] 

genericqa commented on YARN-8292:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
13s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 9 new + 38 unchanged - 0 fixed = 47 total (was 38) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
32s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 16s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}156m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8292 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923819/YARN-8292.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 290f3268becf 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/persona

[jira] [Commented] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart


[ 
https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478394#comment-16478394
 ] 

genericqa commented on YARN-8290:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 62m 
32s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923820/YARN-8290.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f6bb0e1a6c56 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / be53969 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/20759/artifact/out/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20759/testReport/ |
| Max. process+thread count | 799 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/Pre

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478387#comment-16478387
 ] 

genericqa commented on YARN-8292:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
20s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 9 new + 42 unchanged - 0 fixed = 51 total (was 42) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
10s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m  2s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}155m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoCreatedQueuePreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8292 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923810/YARN-8292.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 49598110aae6 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Per

[jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats


[ 
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478333#comment-16478333
 ] 

genericqa commented on YARN-8310:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 30s{color} 
| {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common generated 1 
new + 33 unchanged - 0 fixed = 34 total (was 33) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 21s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 5 new + 
35 unchanged - 0 fixed = 40 total (was 35) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 56s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
11s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
20s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8310 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923816/YARN-8310.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4d6cc1cdb009 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / be53969 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/20758/artifact/out/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20758/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20758/testReport/ |
| asflicense

[jira] [Commented] (YARN-8080) YARN native service should support component restart policy


[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478317#comment-16478317
 ] 

genericqa commented on YARN-8080:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
35s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 15 new + 121 unchanged - 2 fixed = 136 total (was 123) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m  
9s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
40s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8080 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attac

[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-16 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478311#comment-16478311
 ] 

Eric Yang commented on YARN-8141:
-

[~csingh] Thank you for the patch, a few nits:

FindAbsoluteMount method can be skipped.  Container-executor looks up source 
mount location and comparing the absolute path to the white list mount 
(YARN-5534).

DockerContainers.md still have reference to 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS.  This can be removed as 
well.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch, YARN-8141.004.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart

2018-05-16 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-8290:
---

 Assignee: Eric Yang
Affects Version/s: 3.1.1

[~leftnoteasy] According to your suggestion that ACL information is set too 
late and killing AM prior to ACL information is propagated can cause RM 
recovery to load partial application record.  The suggested change is to move 
the ACL setup into ApplicationToSchedulerTransition.  The patch moved the block 
of code accordingly.  Let me know if this is the correct fix.  Thanks

> Yarn application failed to recover with "Error Launching job : User is not 
> set in the application report" error after RM restart
> 
>
> Key: YARN-8290
> URL: https://issues.apache.org/jira/browse/YARN-8290
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8290.001.patch
>
>
> Scenario:
> 1) Start 5 streaming application in background
> 2) Kill Active RM and cause RM failover
> After RM failover, The application failed with below error.
> {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception on [rm2] : 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1517520038847_0003' doesn't exist in RM. Please check 
> that the job submission was successful.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
> , so propagating back to caller.
> 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application 
> application_1517520038847_0003
> 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1517520038847_0003
> 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is 
> not set in the application report
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups


 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: YARN-4599.012.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, 
> YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart

2018-05-16 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8290:

Attachment: YARN-8290.001.patch

> Yarn application failed to recover with "Error Launching job : User is not 
> set in the application report" error after RM restart
> 
>
> Key: YARN-8290
> URL: https://issues.apache.org/jira/browse/YARN-8290
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Priority: Major
> Attachments: YARN-8290.001.patch
>
>
> Scenario:
> 1) Start 5 streaming application in background
> 2) Kill Active RM and cause RM failover
> After RM failover, The application failed with below error.
> {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception on [rm2] : 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1517520038847_0003' doesn't exist in RM. Please check 
> that the job submission was successful.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
> , so propagating back to caller.
> 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application 
> application_1517520038847_0003
> 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1517520038847_0003
> 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is 
> not set in the application report
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478266#comment-16478266
 ] 

Wangda Tan commented on YARN-8292:
--

[~jlowe],
I think you're correct :). I take my word back, my previous assumption:

{code}
   Σ(selected-container.resource) <= (for all resource types) 
Σ(queue.to-be-obtain)
selected-container  queue
{code}

Can break one case which one starving queue need to preempt containers from two 
over-utilized queues. 
For example:
{code}
queue-A, 
guaranteed: <30,50> , used: <40, 60>.

queue-B,
guaranteed: <30,50>, used: <40, 60>
{code} 

Assume we have a queue C want 20:20 resources.
So in this case, both of queue-A/queue-B, resource to obtain = 10:10

If containers running on the system have same size = 20:30. Under my existing 
approach, nothing can be preempted. This is also why some UT failed.

I just used your approach: 
bq. I think the check for a zero resource can be dropped and it simplifies to 
the toObtainAfterPreemption component-wise max'd with zero is less than the 
amount to obtain from the partition (after being max'd with zero).

With the 0 resource type check I commented above:
{code}
// If a toObtain resource type == 0, set it to -1 to avoid 0 resource
// type affect following doPreemption check: isAnyMajorResourceZero
for (ResourceInformation ri : toObtainByPartition.getResources()) {
  if (ri.getValue() == 0) {
ri.setValue(-1);
  }
}
{code} 

Now everything works. Please check the attached patch (ver.3) to see if it 
works.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Attachment: YARN-8292.003.patch

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats


[ 
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478258#comment-16478258
 ] 

Robert Kanter commented on YARN-8310:
-

The patch adds back code very similar to the original reading code for each of 
the 3 tokens.  It will get called if a {{InvalidProtocolBufferException}} is 
thrown when trying to parse it as a protobuf.  Also added tests.

I had to add additional code to handle the case where the old 
{{ContainerTokenIdentifier}} format ends early because YARN-2581 added a 
{{LogAggregationContext}} field to the end, which might not exist.

> Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
> ContainerTokenIdentifier formats
> ---
>
> Key: YARN-8310
> URL: https://issues.apache.org/jira/browse/YARN-8310
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Major
> Attachments: YARN-8310.001.patch, YARN-8310.branch-2.001.patch
>
>
> In some recent upgrade testing, we saw this error causing the NodeManager to 
> fail to startup afterwards:
> {noformat}
> org.apache.hadoop.service.ServiceStateException: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> contained an invalid tag (zero).
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message contained an invalid tag (zero).
>   at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
>   at 
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
>   at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
>   at 
> org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
>   at 
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
>   at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 5 more
> {noformat}
> The NodeManager fails because it's trying to read a 
> {{ContainerTokenIdentifier}} in the "old" format before we changed them to 
> protobufs (YARN-668).  This is very similar to YARN-5594 where we ran into a 
> similar problem with the ResourceManager and RM Delegation Tokens.
> To provide a better experience, we should make the code able to read the old 
> format if it's unable to read it using the new format.  We didn't run into 
> any errors with the other two types of tokens that YARN-668 incompatibly 
> changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may a

[jira] [Updated] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats


 [ 
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-8310:

Attachment: YARN-8310.001.patch
YARN-8310.branch-2.001.patch

> Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
> ContainerTokenIdentifier formats
> ---
>
> Key: YARN-8310
> URL: https://issues.apache.org/jira/browse/YARN-8310
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Major
> Attachments: YARN-8310.001.patch, YARN-8310.branch-2.001.patch
>
>
> In some recent upgrade testing, we saw this error causing the NodeManager to 
> fail to startup afterwards:
> {noformat}
> org.apache.hadoop.service.ServiceStateException: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> contained an invalid tag (zero).
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message contained an invalid tag (zero).
>   at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
>   at 
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
>   at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
>   at 
> org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
>   at 
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
>   at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 5 more
> {noformat}
> The NodeManager fails because it's trying to read a 
> {{ContainerTokenIdentifier}} in the "old" format before we changed them to 
> protobufs (YARN-668).  This is very similar to YARN-5594 where we ran into a 
> similar problem with the ResourceManager and RM Delegation Tokens.
> To provide a better experience, we should make the code able to read the old 
> format if it's unable to read it using the new format.  We didn't run into 
> any errors with the other two types of tokens that YARN-668 incompatibly 
> changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix 
> those while we're at it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8310) Handle old NMTokenIdentifier, AMRMTokenIdentifier, and ContainerTokenIdentifier formats

Robert Kanter created YARN-8310:
---

 Summary: Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
ContainerTokenIdentifier formats
 Key: YARN-8310
 URL: https://issues.apache.org/jira/browse/YARN-8310
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter


In some recent upgrade testing, we saw this error causing the NodeManager to 
fail to startup afterwards:
{noformat}
org.apache.hadoop.service.ServiceStateException: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message contained 
an invalid tag (zero).
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
contained an invalid tag (zero).
at 
com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
at 
com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1860)
at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.(YarnSecurityTokenProtos.java:1824)
at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
at 
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
at 
org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
at 
org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
at 
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
... 5 more
{noformat}
The NodeManager fails because it's trying to read a 
{{ContainerTokenIdentifier}} in the "old" format before we changed them to 
protobufs (YARN-668).  This is very similar to YARN-5594 where we ran into a 
similar problem with the ResourceManager and RM Delegation Tokens.

To provide a better experience, we should make the code able to read the old 
format if it's unable to read it using the new format.  We didn't run into any 
errors with the other two types of tokens that YARN-668 incompatibly changed 
(NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix those while 
we're at it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478246#comment-16478246
 ] 

Wangda Tan commented on YARN-8292:
--

Attached ver.2 patch, which fixed 0 value rare-resource problem mentioned by 
Jason.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Attachment: YARN-8292.002.patch

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478244#comment-16478244
 ] 

Wangda Tan commented on YARN-8292:
--

[~eepayne],
It is actually on in {{setup()}}. :).

[~jlowe],
I can understand ur suggestion now, but simply drop the check as you mentioned:
bq. I think the check for a zero resource can be dropped and it simplifies to 
the toObtainAfterPreemption component-wise max'd with zero is less than the 
amount to obtain from the partition (after being max'd with zero).
Is not enough.

The reason is, we want to make sure no over-preemption happens. For example. If 
res-to-obtain = (3, 0, 0), and container has size = (4, 1, 0) (The 3rd type is 
0 for both). We don't want the preempt happen because it will make the queue 
under utilized. And it can preempt more containers than required. We need to 
make sure that: 
{code}
   Σ(selected-container.resource) <= (for all resource types) 
Σ(queue.to-be-obtain)
selected-container  queue
{code}

In my previous patch, as you mentioned if some resource type are always 0, it 
will invalidate the check. So I added a check: 
{code}
// If a toObtain resource type == 0, set it to -1 to avoid 0 resource
// type affect following doPreemption check: isAnyMajorResourceZero
for (ResourceInformation ri : toObtainByPartition.getResources()) {
  if (ri.getValue() == 0) {
ri.setValue(-1);
  }
}
{code} 

Before
{code}
if (Resources.greaterThan(rc, clusterResource, toObtainByPartition,
Resources.none())
{code}

It looks like the problem can be solved. Please let me know if you think 
different.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-16 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478235#comment-16478235
 ] 

Suma Shivaprasad commented on YARN-8080:


Thanks [~eyang] Have addressed review comments for terminating according to the 
restart policy and fixed most of the CS issues. 

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch, 
> YARN-8080.014.patch, YARN-8080.015.patch, YARN-8080.016.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8309) Diagnostic message for yarn service app failure due token renewal should be improved

2018-05-16 Thread Yesha Vora (JIRA)

Yesha Vora created YARN-8309:

Summary: Diagnostic message for yarn service app failure due token
renewal should be improved
Key: YARN-8309
URL: https://issues.apache.org/jira/browse/YARN-8309
Project: Hadoop YARN
Issue Type: Bug
Reporter: Yesha Vora

When Yarn service application failed due to token renewal issue , The
diagonstic message was unclear .
{code:java}
Application application_1526413043392_0002 failed 20 times due to AM Container
for appattempt_1526413043392_0002_20 exited with exitCode: 1 Failing this
attempt.Diagnostics: [2018-05-15 23:15:28.779]Exception from container-launch.
Container id: container_e04_1526413043392_0002_20_01 Exit code: 1 Exception
message: Launch container failed Shell output: main : command provided 1 main :
run as user is hbase main : requested yarn user is hbase Getting exit code
file... Creating script paths... Writing pid file... Writing to tmp file
/grid/0/hadoop/yarn/local/nmPrivate/application_1526413043392_0002/container_e04_1526413043392_0002_20_01/container_e04_1526413043392_0002_20_01.pid.tmp
Writing to cgroup task files... Creating local dirs... Launching container...
Getting exit code file... Creating script paths... [2018-05-15
23:15:28.806]Container exited with a non-zero exit code 1. Error file:
prelaunch.err. Last 4096 bytes of prelaunch.err : [2018-05-15
23:15:28.807]Container exited with a non-zero exit code 1. Error file:
prelaunch.err. Last 4096 bytes of prelaunch.err : For more detailed output,
check the application tracking page:
https://xxx:8090/cluster/app/application_1526413043392_0002 Then click on links
to logs of each attempt. . Failing the application.{code}
Here, diagnostic message should be improved to specify that AM is failing due
to token renewal issues.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8080) YARN native service should support component restart policy

2018-05-16 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8080:
---
Attachment: YARN-8080.016.patch

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch, 
> YARN-8080.014.patch, YARN-8080.015.patch, YARN-8080.016.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8308) Yarn service app fails due to issues with Renew Token

2018-05-16 Thread Yesha Vora (JIRA)

Yesha Vora created YARN-8308:


 Summary: Yarn service app fails due to issues with Renew Token
 Key: YARN-8308
 URL: https://issues.apache.org/jira/browse/YARN-8308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.0
Reporter: Yesha Vora


Run Yarn service application beyond dfs.namenode.delegation.token.max-lifetime. 
Here, yarn service application fails with below error. 

{code}
2018-05-15 23:14:35,652 [main] WARN  ipc.Client - Exception encountered while 
connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
2018-05-15 23:14:35,654 [main] INFO  service.AbstractService - Service Service 
Master failed in state INITED
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
at org.apache.hadoop.ipc.Client.call(Client.java:1437)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581)
at 
org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182)
at 
org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337)
at 
org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242)
at 
org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
2018-05-15 23:14:35,659 [main] INFO  service.ServiceMaster - Stopping app master
2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting 
service master
org.apache.hadoop.service.ServiceStateException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
at 
org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
Caus

[jira] [Commented] (YARN-8103) Add CLI interface to query node attributes


[ 
https://issues.apache.org/jira/browse/YARN-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478211#comment-16478211
 ] 

genericqa commented on YARN-8103:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-3409 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  6m 
53s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
30s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
33s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
39s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  7m  
1s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
45s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
45s{color} | {color:green} YARN-3409 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 25m  
7s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 25m  7s{color} | 
{color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 25m  7s{color} 
| {color:red} root in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 46s{color} | {color:orange} root: The patch generated 14 new + 187 unchanged 
- 8 fixed = 201 total (was 195) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
27s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
23s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
14s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 2 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 21s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
47s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
24s{color} | {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m 16s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
51s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:

[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded


[ 
https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478207#comment-16478207
 ] 

Robert Kanter commented on YARN-8273:
-

Thanks for the patch.  One question:
- Why make {{LogAggregationDFSException}} an undeclared exception?  Why not 
make it a subclass of {{YarnException}} and declare it?
-- Also, if we do keep it as undeclared, it should be a subclass of 
{{YarnRuntimeException}} instead of {{RuntimeException}}.

I was going to suggest we catch the {{DSQuotaExceededException}} when closing 
{{writer}}, but it turns out that {{TFile#close}} does _not_ close the 
underlying {{FSDataOutputStream}}.  That's probably not what most people are 
expecting, but there's nothing we can do about that now. :/

> Log aggregation does not warn if HDFS quota in target directory is exceeded
> ---
>
> Key: YARN-8273
> URL: https://issues.apache.org/jira/browse/YARN-8273
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-8273.000.patch, YARN-8273.001.patch, 
> YARN-8273.002.patch
>
>
> It appears that if an HDFS space quota is set on a target directory for log 
> aggregation and the quota is already exceeded when log aggregation is 
> attempted, zero-byte log files will be written to the HDFS directory, however 
> NodeManager logs do not reflect a failure to write the files successfully 
> (i.e. there are no ERROR or WARN messages to this effect).
> An improvement may be worth investigating to alert users to this scenario, as 
> otherwise logs for a YARN application may be missing both on HDFS and locally 
> (after local log cleanup is done) and the user may not otherwise be informed.
> Steps to reproduce:
> * Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB)
> * Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full
> * Run a Spark or MR job in the cluster
> * Observe that zero byte files are written to HDFS after job completion
> * Observe that YARN container logs are also not present on the NM hosts (or 
> are deleted after yarn.nodemanager.delete.debug-delay-sec)
> * Observe that no ERROR or WARN messages appear to be logged in the NM role 
> log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-16 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478184#comment-16478184
 ] 

Eric Payne commented on YARN-4781:
--

Hi [~sunilg]. Will you have an opportunity to review the latest patch?

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups


[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478178#comment-16478178
 ] 

genericqa commented on YARN-4599:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
11s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 31m 
25s{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
30m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m  
0s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 27m  0s{color} | 
{color:red} root generated 5 new + 6 unchanged - 0 fixed = 11 total (was 6) 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 27m  0s{color} 
| {color:red} root generated 188 new + 1277 unchanged - 0 fixed = 1465 total 
(was 1277) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
41s{color} | {color:green} root: The patch generated 0 new + 235 unchanged - 1 
fixed = 235 total (was 236) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 17m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 25s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}150m 24s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}335m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.TestDFSInotifyEventInputStre

[jira] [Commented] (YARN-8071) Add ability to specify nodemanager environment variables individually

2018-05-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478163#comment-16478163
 ] 

Hudson commented on YARN-8071:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14213 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14213/])
YARN-8071. Add ability to specify nodemanager environment variables (jlowe: rev 
be539690477f7fee8f836bf3612cbe7ff6a3506e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java


> Add ability to specify nodemanager environment variables individually
> -
>
> Key: YARN-8071
> URL: https://issues.apache.org/jira/browse/YARN-8071
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8071.001.patch, YARN-8071.002.patch, 
> YARN-8071.003.patch
>
>
> YARN-6830 describes a problem where environment variables that contain commas 
> cannot be specified via {{-Dmapreduce.map.env}}.
> For example:
> {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}}
> will set {{MOUNTS}} to {{/tmp/foo}}
> In that Jira, [~aw] suggested that we change the API to provide a way to 
> specify environment variables individually, the same way that Spark does.
> {quote}Rather than fight with a regex why not redefine the API instead?
>  
> -Dmapreduce.map.env.MODE=bar
>  -Dmapreduce.map.env.IMAGE_NAME=foo
>  -Dmapreduce.map.env.MOUNTS=/tmp/foo,/tmp/bar
> ...
> e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> This greatly simplifies the input validation needed and makes it clear what 
> is actually being defined.
> {quote}
> The mapreduce properties were dealt with in [MAPREDUCE-7069].  This Jira will 
> address the YARN properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478157#comment-16478157
 ] 

Eric Payne commented on YARN-8292:
--

bq. you can check 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePremption
 test for details.
This test is not actually enabling DRF. You need to add the 5th argument to 
{{buildEnv()}}:
{code}
-buildEnv(labelsConfig, nodesConfig, queuesConfig, appsConfig);
+buildEnv(labelsConfig, nodesConfig, queuesConfig, appsConfig, true);
{code}

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8307) [atsv2 read acls] Coprocessor for reader authorization check

2018-05-16 Thread Vrushali C (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-8307:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-7055

> [atsv2 read acls] Coprocessor for reader authorization check
> 
>
> Key: YARN-8307
> URL: https://issues.apache.org/jira/browse/YARN-8307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Vrushali C
>Priority: Major
>
> Jira to track coprocessor creation for reader authorization check when 
> security is enabled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8307) [atsv2 read acls] Coprocessor for reader authorization check

2018-05-16 Thread Vrushali C (JIRA)

Vrushali C created YARN-8307:


 Summary: [atsv2 read acls] Coprocessor for reader authorization 
check
 Key: YARN-8307
 URL: https://issues.apache.org/jira/browse/YARN-8307
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vrushali C
Assignee: Vrushali C



Jira to track coprocessor creation for reader authorization check when security 
is enabled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8071) Add ability to specify nodemanager environment variables individually

2018-05-16 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478114#comment-16478114
 ] 

Jason Lowe commented on YARN-8071:
--

bq.  I assume this windows-only test must be failing without this fix.

Ah, my bad.  I missed the fact that this test was Windows-only, so that's why 
it was "passing" even without the change.

+1 for the latest patch.  Committing this.


> Add ability to specify nodemanager environment variables individually
> -
>
> Key: YARN-8071
> URL: https://issues.apache.org/jira/browse/YARN-8071
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-8071.001.patch, YARN-8071.002.patch, 
> YARN-8071.003.patch
>
>
> YARN-6830 describes a problem where environment variables that contain commas 
> cannot be specified via {{-Dmapreduce.map.env}}.
> For example:
> {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}}
> will set {{MOUNTS}} to {{/tmp/foo}}
> In that Jira, [~aw] suggested that we change the API to provide a way to 
> specify environment variables individually, the same way that Spark does.
> {quote}Rather than fight with a regex why not redefine the API instead?
>  
> -Dmapreduce.map.env.MODE=bar
>  -Dmapreduce.map.env.IMAGE_NAME=foo
>  -Dmapreduce.map.env.MOUNTS=/tmp/foo,/tmp/bar
> ...
> e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> This greatly simplifies the input validation needed and makes it clear what 
> is actually being defined.
> {quote}
> The mapreduce properties were dealt with in [MAPREDUCE-7069].  This Jira will 
> address the YARN properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478104#comment-16478104
 ] 

Jason Lowe commented on YARN-8292:
--

bq. After preemption, there're at least one 0 major resources (which indicates 
that the queue is still satisfied after preemption).

I'm still confused by this point.  How is that not going to be always true when 
the cluster has a rarely-used resource dimension?  For example, let's say GPU 
is one of the dimensions, and all the apps that want to use GPUs are all 
running in only one of many queues on the cluster.  The other queues will all 
have zero for their GPU usage, and any cross-queue preemptions between those 
other queues will all have zero in the GPU resource for toObtainFromPartition 
and toObtainAfterPreemption.  In other words, it effectively disabled the less 
than Resources.none check when comparing preemptions between these 
non-GPU-using queues because GPU will always be zero so isAnyMajorResourceZero 
will always be true.  Or am I missing something?

For the case of not wanting to kill a container that is (4, 1, 1) when the ask 
is only (3, -1, -1), the comparison against Resources.none should cover that.  
What is an example scenario where the additional check if any resource 
dimension is zero is needed to do the right thing?  From the scenario I 
described above, I can see where it can (incorrectly?) override the comparison 
against Resources.none and preempt a (4, 1, 0) container when the ask is only 
(3, -1, 0).


> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8286) Add NMClient callback on container relaunch

2018-05-16 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478085#comment-16478085
 ] 

Jason Lowe commented on YARN-8286:
--

This could be implemented as something in the AM/NM client connection, but it 
would require the AM to keep a long-lived connection to every NM that has 
containers running on it.  I think a simpler approach for the AM is to get this 
information via the same channel it gets other container notifications like 
allocated, completed, etc. and that's the AM-RM heartbeat (i.e.: 
ApplicationMasterProtocol#allocate).  Currently when a container completes on 
the NM side, the NM lets the RM know via an out-of-band heartbeat, and the RM 
in turn lets the AM know on the next AM hearbeat.

I think it would be relatively straightforward to have the NM notify the RM of 
any container relaunches, just like it already does for container launches and 
completions.  The RM can then relay this information to the AM.  Then the AM 
wouldn't need to keep connected to every NM for relaunch status, and the 
container relaunch events would arrive to the AM just like container completion 
events do today without any new connections required.  Thoughts?


> Add NMClient callback on container relaunch
> ---
>
> Key: YARN-8286
> URL: https://issues.apache.org/jira/browse/YARN-8286
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Priority: Critical
>
> The AM may need to perform actions when a container has been relaunched. For 
> example, the service AM would want to change the state it has recorded for 
> the container and retrieve new container status for the container, in case 
> the container IP has changed. (The NM would also need to remove the IP it has 
> stored for the container, so container status calls don't return an IP for a 
> container that is not currently running.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain

2018-05-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478053#comment-16478053
 ] 

Hudson commented on YARN-7933:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14212 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14212/])
YARN-7933. [atsv2 read acls] Add TimelineWriter#writeDomain. (Rohith 
(haibochen: rev e3b7d7ac1694b8766ae11bc7e8ecf09763bb26db)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/TimelineWriter.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/collector/TestTimelineCollector.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestHBaseTimelineStorageDomain.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/domain/DomainTableRW.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/HBaseTimelineWriterImpl.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timelineservice/TimelineDomain.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorWebService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/FileSystemTimelineWriterImpl.java


> [atsv2 read acls] Add TimelineWriter#writeDomain 
> -
>
> Key: YARN-7933
> URL: https://issues.apache.org/jira/browse/YARN-7933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7933.01.patch, YARN-7933.02.patch, 
> YARN-7933.03.patch, YARN-7933.04.patch, YARN-7933.05.patch, YARN-7933.06.patch
>
>
>  
> Add an API TimelineWriter#writeDomain for writing the domain info 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8305) [UI2] No information available per container about the memory/vcores


 [ 
https://issues.apache.org/jira/browse/YARN-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8305:
-
Reporter: Sumana Sathish  (was: Gergely Novák)

> [UI2] No information available per container about the memory/vcores
> 
>
> Key: YARN-8305
> URL: https://issues.apache.org/jira/browse/YARN-8305
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Sumana Sathish
>Assignee: Gergely Novák
>Priority: Major
> Attachments: YARN-8305.001.patch
>
>
> In the Applications > App > Attempts > Attempt page, the Containers panel 
> shows information about container start time, status, etc., but not about 
> container memory/vcores used.
> Discovered by [~ssath...@hortonworks.com].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Description: 
This is an example of the problem: 
  
{code}
//   guaranteed,  max,used,   pending
"root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
"-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
"-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
"-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
{code}

There're 3 resource types. Total resource of the cluster is 30:18:6
For both of a/b, there're 3 containers running, each of container is 2:2:1.
Queue c uses 0 resource, and have 1:1:1 pending resource.

Under existing logic, preemption cannot happen.

  was:
 
 This is an example of the problem: (Same if we have more than 2 resources)
  

Let's say we have 3 queues A/B/C. All containers with equal size <2,3>

 
||Queue||Guaranteed||Used ||Pending||
|A|<20, 10>|<20,30>| |
|B|<20, 10>|0|0|
|C|<20, 10>|0|<20, 30>|
| | | | |

 

Under current logic, A's calculated to-preempt (how much resource other queue 
can preempt) will be <0, 20>. The preemption will not happen. However, under 
the context of DRC, queue A is using more resource than guaranteed, so queue C 
will be starved


> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Fix Version/s: (was: 3.1.1)
   (was: 3.2.0)

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


 [ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8292:
-
Target Version/s: 3.2.0, 3.1.1

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478026#comment-16478026
 ] 

Wangda Tan edited comment on YARN-8292 at 5/16/18 8:16 PM:
---

Thanks [~eepayne], 

Let me revise the example a bit, I was trying to simplify it to avoid confusing 
to people, I think the example is wrong. Here's one I was using in test:
{code}
//   guaranteed,  max,used,   pending
"root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
"-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
"-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
"-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
{code}

There're 3 resource types. Total resource of the cluster is 30:18:6
For both of a/b, there're 3 containers running, each of container is 2:2:1.
Queue c uses 0 resource, and have 1:1:1 pending resource. 

Prior to the attached patch. The preemption cannot happen, you can check 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePreemption
 test for details.


was (Author: leftnoteasy):
Thanks [~eepayne], 

Let me revise the example a bit, I was trying to simplify it to avoid confusing 
to people, I think the example is wrong. Here's one I was using in test:
{code}
//   guaranteed,  max,used,   pending
"root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
"-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
"-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
"-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
{code}

There're 3 resource types. Total resource of the cluster is 30:18:6
For both of a/b, there're 3 containers running, each of container is 2:2:1
Prior to the attached patch. The preemption cannot happen, you can check 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePreemption
 test for details.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
>  
>  This is an example of the problem: (Same if we have more than 2 resources)
>   
> Let's say we have 3 queues A/B/C. All containers with equal size <2,3>
>  
> ||Queue||Guaranteed||Used ||Pending||
> |A|<20, 10>|<20,30>| |
> |B|<20, 10>|0|0|
> |C|<20, 10>|0|<20, 30>|
> | | | | |
>  
> Under current logic, A's calculated to-preempt (how much resource other queue 
> can preempt) will be <0, 20>. The preemption will not happen. However, under 
> the context of DRC, queue A is using more resource than guaranteed, so queue 
> C will be starved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478026#comment-16478026
 ] 

Wangda Tan commented on YARN-8292:
--

Thanks [~eepayne], 

Let me revise the example a bit, I was trying to simplify it to avoid confusing 
to people, I think the example is wrong. Here's one I was using in test:
{code}
//   guaranteed,  max,used,   pending
"root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
"-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
"-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
"-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
{code}

There're 3 resource types. Total resource of the cluster is 30:18:6
For both of a/b, there're 3 containers running, each of container is 2:2:1
Prior to the attached patch. The preemption cannot happen, you can check 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePreemption
 test for details.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch
>
>
>  
>  This is an example of the problem: (Same if we have more than 2 resources)
>   
> Let's say we have 3 queues A/B/C. All containers with equal size <2,3>
>  
> ||Queue||Guaranteed||Used ||Pending||
> |A|<20, 10>|<20,30>| |
> |B|<20, 10>|0|0|
> |C|<20, 10>|0|<20, 30>|
> | | | | |
>  
> Under current logic, A's calculated to-preempt (how much resource other queue 
> can preempt) will be <0, 20>. The preemption will not happen. However, under 
> the context of DRC, queue A is using more resource than guaranteed, so queue 
> C will be starved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups


[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478004#comment-16478004
 ] 

Miklos Szegedi commented on YARN-4599:
--

[~snemeth], thanks for the review.

oomHandlerTemp is necessary since we may throw an exception after we set it. An 
external code optimized by JVM (setting a variable before the constructor is 
finished) may get a partial copy of this object in case of the exception. It is 
better to set the fields after the exception is not thrown.

3) This code has to be very fast do the same conversion for 1000s of containers 
potentially, so I will keep the plain multiplication instead of calling int 
external code that deals with strings.

 

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, 
> YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.sandflee.patch, 
> yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477999#comment-16477999
 ] 

Wangda Tan edited comment on YARN-8292 at 5/16/18 8:08 PM:
---

[~jlowe], thanks for your review, 

bq. I think the check for a zero resource can be dropped and it simplifies to 
the toObtainAfterPreemption component-wise max'd with zero is less than the 
amount to obtain from the partition (after being max'd with zero). In other 
words, we want to preempt as long as we have some resources we want to obtain 
from the partition and preempting the container makes progress on at least one 
of the resource dimensions being requested from the partition.
The second part is correct {{..at least one of the resource dimensions being 
requested from the partition}}, that's why we added following check: 
{code}
198 doPreempt = doPreempt && (Resources.lessThan(rc, 
clusterResource,
199 Resources
200 .componentwiseMax(toObtainAfterPreemption, 
Resources.none()),
201 Resources.componentwiseMax(toObtainByPartition, 
Resources.none(;
{code}

The check of {{toObtainAfterPreemption}} is to make sure we will not do 
over-preemption. For example, if a queue's res-to-obtain = (3,-1,-1), and a 
container is (4,1,1). Even if preempt the container can make positive 
contribution, we will not do this because after preemption, the queue becomes 
an under-utilized queue and it may preempt resources from other queues.

Following logics are mostly to cover two cases to avoid over-preemption:
{code}
195 doPreempt = Resources.greaterThanOrEqual(rc, clusterResource,
196 toObtainAfterPreemption, Resources.none()) || Resources
197 .isAnyMajorResourceZero(rc, toObtainAfterPreemption);
{code}
a. After preemption, there're some positive major resources.
b. After preemption, there're at least one 0 major resources (which indicates 
that the queue is still satisfied after preemption).

Please let me know if you still have any other questions.


was (Author: leftnoteasy):
[~jlowe], thanks for your review, 

bq. I think the check for a zero resource can be dropped and it simplifies to 
the toObtainAfterPreemption component-wise max'd with zero is less than the 
amount to obtain from the partition (after being max'd with zero). In other 
words, we want to preempt as long as we have some resources we want to obtain 
from the partition and preempting the container makes progress on at least one 
of the resource dimensions being requested from the partition.
The second part is correct, that's why we added following check: 
{code}
198 doPreempt = doPreempt && (Resources.lessThan(rc, 
clusterResource,
199 Resources
200 .componentwiseMax(toObtainAfterPreemption, 
Resources.none()),
201 Resources.componentwiseMax(toObtainByPartition, 
Resources.none(;
{code}

The check of {{toObtainAfterPreemption}} is to make sure we will not do 
over-preemption. For example, if a queue's res-to-obtain = (3,-1,-1), and a 
container is (4,1,1). Even if preempt the container can make positive 
contribution, we will not do this because after preemption, the queue becomes 
an under-utilized queue and it may preempt resources from other queues.

Following logics are mostly to cover two cases to avoid over-preemption:
{code}
195 doPreempt = Resources.greaterThanOrEqual(rc, clusterResource,
196 toObtainAfterPreemption, Resources.none()) || Resources
197 .isAnyMajorResourceZero(rc, toObtainAfterPreemption);
{code}
a. After preemption, there're some positive major resources.
b. After preemption, there're at least one 0 major resources (which indicates 
that the queue is still satisfied after preemption).

Please let me know if you still have any other questions.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch
>
>
>  
>  This is an example of the problem: (Same if we have more than 2 resources)
>   
> Let's say we have 3 queues A/B/C. All containers with equal size <2,3>
>  
> ||Queue||Guaranteed||Used ||Pending||
> |A|<20, 10>|<20,30>| |
> |B|<20, 10>|0|0|
> |C|<20, 10>|0|<20, 30>|
> | | | | |
>  
> Under current logic, A's calculated to-preempt (how much resource other queue 
> can preempt) will be <

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative