date:20180305

[jira] [Updated] (YARN-7919) Refactor timelineservice-hbase module into submodules

2018-03-05 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-7919:

Fix Version/s: 3.2.0
   2.10.0

> Refactor timelineservice-hbase module into submodules
> -
>
> Key: YARN-7919
> URL: https://issues.apache.org/jira/browse/YARN-7919
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineservice
>Affects Versions: 3.0.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Fix For: 2.10.0, 3.2.0
>
> Attachments: YARN-7919-branch-2.05.patch, YARN-7919.00.patch, 
> YARN-7919.01.patch, YARN-7919.02.patch, YARN-7919.03.patch, 
> YARN-7919.04.patch, YARN-7919.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7346) Add a profile to allow optional compilation for ATSv2 with HBase-2.0

2018-03-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387369#comment-16387369
 ] 

Hudson commented on YARN-7346:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13776 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13776/])
YARN-7346. Add a profile to allow optional compilation for ATSv2 with 
(rohithsharmaks: rev 55ba49dd071b66e72c47a1c41e88b9a5feddf53b)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/pom.xml
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/hadoop-yarn-server-timelineservice-hbase-server-2/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunCoprocessor.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/hadoop-yarn-server-timelineservice-hbase-server-2/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/package-info.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/pom.xml
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/hadoop-yarn-server-timelineservice-hbase-server-2/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowScanner.java
* (delete) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowScanner.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/hadoop-yarn-server-timelineservice-hbase-server-1/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/package-info.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client/pom.xml
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/hadoop-yarn-server-timelineservice-hbase-server-1/pom.xml
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/hadoop-yarn-server-timelineservice-hbase-server-1/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/package-info.java
* (edit) hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml
* (delete) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunCoprocessor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/TestHBaseStorageFlowRunCompaction.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/hadoop-yarn-server-timelineservice-hbase-server-1/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowScanner.java
* (delete) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/src/assembly/coprocessor.xml
* (delete) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/package-info.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/hadoop-yarn-server-timelineservice-hbase-server-2/pom.xml
* (edit) hadoop-project/pom.xml
* (delete) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/HBaseTimelineServerUtils.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/hadoop-yarn-server-timelineservice-hbase-server-2/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowScannerOperation.java
* (add)

[jira] [Commented] (YARN-7652) Handle AM register requests asynchronously in FederationInterceptor

2018-03-05 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387355#comment-16387355
 ] 

SammiChen commented on YARN-7652:
-

Hi [~botong], is it still on target for 2.9.1? If not, can we push it out from 
2.9.1 to next release? 

> Handle AM register requests asynchronously in FederationInterceptor
> ---
>
> Key: YARN-7652
> URL: https://issues.apache.org/jira/browse/YARN-7652
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, federation
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Subru Krishnan
>Assignee: Botong Huang
>Priority: Major
>
> We (cc [~goiri]/[~botong]) observed that the {{FederationInterceptor}} in 
> {{AMRMProxy}} (and consequently the AM) is blocked if the _StateStore_ has 
> outdated info about a _SubCluster_. This is because we handle AM register 
> requests synchronously. This jira proposes to move to async similar to how we 
> operate with allocate invocations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8002) Support NOT_SELF and ALL namespace types for allocation tag

2018-03-05 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8002:
--
Description: 
This is a continua task after YARN-7972, YARN-7972 adds support to specify tags 
with namespace SELF and APP_ID, like following
 * self/
 * app-id//

this task is to track the work to support 2 of remaining namespace types 
*NOT_SELF* & *ALL* (we'll support app-label later),
 * not-self/
 * all/

this will require a bit refactoring in {{AllocationTagsManager}} as it needs to 
do some proper aggregation on tags for multiple apps.

  was:
This is a continua task after YARN-7972, YARN-7972 adds support to specify tags 
with namespace SELF and APP_ID, like following
 * self/
 * app-id//

this task is to track the work to support 2 of remaining namespace types 
*NOT_SELF* & *ALL* (we'll support app-label later),
 * not-self/
 * all/

this will require a bit refactoring in {{AllocationTagsManager}} as it needs to 
do some proper aggregation on tags for multiple apps.

 

 


> Support NOT_SELF and ALL namespace types for allocation tag
> ---
>
> Key: YARN-8002
> URL: https://issues.apache.org/jira/browse/YARN-8002
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8002.001.patch
>
>
> This is a continua task after YARN-7972, YARN-7972 adds support to specify 
> tags with namespace SELF and APP_ID, like following
>  * self/
>  * app-id//
> this task is to track the work to support 2 of remaining namespace types 
> *NOT_SELF* & *ALL* (we'll support app-label later),
>  * not-self/
>  * all/
> this will require a bit refactoring in {{AllocationTagsManager}} as it needs 
> to do some proper aggregation on tags for multiple apps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8002) Support NOT_SELF and ALL namespace types for allocation tag

2018-03-05 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8002:
--
Attachment: YARN-8002.001.patch

> Support NOT_SELF and ALL namespace types for allocation tag
> ---
>
> Key: YARN-8002
> URL: https://issues.apache.org/jira/browse/YARN-8002
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8002.001.patch
>
>
> This is a continua task after YARN-7972, YARN-7972 adds support to specify 
> tags with namespace SELF and APP_ID, like following
>  * self/
>  * app-id//
> this task is to track the work to support 2 of remaining namespace types 
> *NOT_SELF* & *ALL* (we'll support app-label later),
>  * not-self/
>  * all/
> this will require a bit refactoring in {{AllocationTagsManager}} as it needs 
> to do some proper aggregation on tags for multiple apps.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7346) Add a profile to allow optional compilation for ATSv2 with HBase-2.0

2018-03-05 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-7346:

Fix Version/s: 3.2.0

> Add a profile to allow optional compilation for ATSv2 with HBase-2.0
> 
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Haibo Chen
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.02.patch, YARN-7346.03-incremental.patch, YARN-7346.03.patch, 
> YARN-7346.04-incremental.patch, YARN-7346.04.patch, YARN-7346.05.patch, 
> YARN-7346.06.patch, YARN-7346.07.patch, YARN-7346.08-incremental.patch, 
> YARN-7346.08.patch, YARN-7346.09.patch, YARN-7346.10.patch, 
> YARN-7346.11.patch, YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, 
> YARN-7581.prelim.patch, 
> hadoop-yarn-server-timelineservice-hbase-server-1-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-1-javadoc-report.txt, 
> hadoop-yarn-server-timelineservice-hbase-server-2-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-2-javadoc-report.txt
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> [https://pastebin.com/Ms4jYEVB]
> This issue is to fix the compilation errors.
> The scope of the Jira is to add a profile to allow optional compilation for 
> ATSv2 with HBase2.0. The default compilation for trunk will still be for 
> hbase 1.2.6. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7601) Incorrect container states recovered as LevelDB uses alphabetical order

2018-03-05 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387353#comment-16387353
 ] 

SammiChen commented on YARN-7601:
-

Hi [~sampada15], is it still on target for 2.9.1? If not, can we put it out 
from 2.9.1 to next release? 

> Incorrect container states recovered as LevelDB uses alphabetical order
> ---
>
> Key: YARN-7601
> URL: https://issues.apache.org/jira/browse/YARN-7601
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sampada Dehankar
>Assignee: Sampada Dehankar
>Priority: Major
> Attachments: YARN-7601.001.patch, YARN-7601.002.patch
>
>
> LevelDB stores key-value pairs in the alphabetical order. Container id 
> concatenated by its state is used as key. So, even if container goes through 
> any states in its life cycle, the order of states for following values 
> retrieved from LevelDB is always going to be as below`:
> LAUNCHED
> PAUSED
> QUEUED
> For eg: If a container is LAUNCHED then PAUSED and LAUNCHED again, the 
> recovered container state is PAUSED currently instead of LAUNCHED.
> We propose to store the timestamp as the value while making call to 
>   
>   storeContainerLaunched
>   storeContainerPaused
>   storeContainerQueued
>   
> so that correct container state is recovered based on timestamps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7346) Add a profile to allow optional compilation for ATSv2 with HBase-2.0

2018-03-05 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387352#comment-16387352
 ] 

Rohith Sharma K S commented on YARN-7346:
-

I committed to trunk. thanks [~haibochen] for the patch! Now you can 
cherry-pick into branch-2. thanks. 

> Add a profile to allow optional compilation for ATSv2 with HBase-2.0
> 
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.02.patch, YARN-7346.03-incremental.patch, YARN-7346.03.patch, 
> YARN-7346.04-incremental.patch, YARN-7346.04.patch, YARN-7346.05.patch, 
> YARN-7346.06.patch, YARN-7346.07.patch, YARN-7346.08-incremental.patch, 
> YARN-7346.08.patch, YARN-7346.09.patch, YARN-7346.10.patch, 
> YARN-7346.11.patch, YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, 
> YARN-7581.prelim.patch, 
> hadoop-yarn-server-timelineservice-hbase-server-1-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-1-javadoc-report.txt, 
> hadoop-yarn-server-timelineservice-hbase-server-2-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-2-javadoc-report.txt
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> [https://pastebin.com/Ms4jYEVB]
> This issue is to fix the compilation errors.
> The scope of the Jira is to add a profile to allow optional compilation for 
> ATSv2 with HBase2.0. The default compilation for trunk will still be for 
> hbase 1.2.6. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7346) Add a profile to allow optional compilation for ATSv2 with HBase-2.0

2018-03-05 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387338#comment-16387338
 ] 

Rohith Sharma K S commented on YARN-7346:
-

I haven't committed yet. I will do it now. 

> Add a profile to allow optional compilation for ATSv2 with HBase-2.0
> 
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.02.patch, YARN-7346.03-incremental.patch, YARN-7346.03.patch, 
> YARN-7346.04-incremental.patch, YARN-7346.04.patch, YARN-7346.05.patch, 
> YARN-7346.06.patch, YARN-7346.07.patch, YARN-7346.08-incremental.patch, 
> YARN-7346.08.patch, YARN-7346.09.patch, YARN-7346.10.patch, 
> YARN-7346.11.patch, YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, 
> YARN-7581.prelim.patch, 
> hadoop-yarn-server-timelineservice-hbase-server-1-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-1-javadoc-report.txt, 
> hadoop-yarn-server-timelineservice-hbase-server-2-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-2-javadoc-report.txt
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> [https://pastebin.com/Ms4jYEVB]
> This issue is to fix the compilation errors.
> The scope of the Jira is to add a profile to allow optional compilation for 
> ATSv2 with HBase2.0. The default compilation for trunk will still be for 
> hbase 1.2.6. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7657) Queue Mapping could provide options to provide 'user' specific auto-created queues under a specified group parent queue

2018-03-05 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387308#comment-16387308
 ] 

genericqa commented on YARN-7657:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 19 new + 111 unchanged - 0 fixed = 130 total (was 111) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 44s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 41s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoCreatedQueuePreemption
 |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-7657 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913142/YARN-7657.2.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2e4246276adf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 745190e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/19895/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit |

[jira] [Commented] (YARN-7346) Add a profile to allow optional compilation for ATSv2 with HBase-2.0

2018-03-05 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387306#comment-16387306
 ] 

Haibo Chen commented on YARN-7346:
--

Sure. Are you able to commit it to trunk? I can try to cherry-pick from trunk 
and resolve conflicts, which is easier than trying from the patch directly.

> Add a profile to allow optional compilation for ATSv2 with HBase-2.0
> 
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.02.patch, YARN-7346.03-incremental.patch, YARN-7346.03.patch, 
> YARN-7346.04-incremental.patch, YARN-7346.04.patch, YARN-7346.05.patch, 
> YARN-7346.06.patch, YARN-7346.07.patch, YARN-7346.08-incremental.patch, 
> YARN-7346.08.patch, YARN-7346.09.patch, YARN-7346.10.patch, 
> YARN-7346.11.patch, YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, 
> YARN-7581.prelim.patch, 
> hadoop-yarn-server-timelineservice-hbase-server-1-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-1-javadoc-report.txt, 
> hadoop-yarn-server-timelineservice-hbase-server-2-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-2-javadoc-report.txt
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> [https://pastebin.com/Ms4jYEVB]
> This issue is to fix the compilation errors.
> The scope of the Jira is to add a profile to allow optional compilation for 
> ATSv2 with HBase2.0. The default compilation for trunk will still be for 
> hbase 1.2.6. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-7997) Add RM HA state in jmx

2018-03-05 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt resolved YARN-7997.

Resolution: Duplicate

Thank you [~rohithsharma]for pointing to the Jira . Could  you please rebase 
the patch and add testcase if possible

> Add RM HA state in jmx 
> ---
>
> Key: YARN-7997
> URL: https://issues.apache.org/jira/browse/YARN-7997
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Priority: Minor
>
> Currently in RM .jmx interface there is not option to know HA state for each 
> RM.
> Need an interface similar to Namenode {{FSNamesystem}} provision to know each 
> RM state



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7346) Add a profile to allow optional compilation for ATSv2 with HBase-2.0

2018-03-05 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387287#comment-16387287
 ] 

Rohith Sharma K S commented on YARN-7346:
-

[~haibochen] there are few conflicts for branch-2. Would you provide patch for 
branch-2? 

> Add a profile to allow optional compilation for ATSv2 with HBase-2.0
> 
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.02.patch, YARN-7346.03-incremental.patch, YARN-7346.03.patch, 
> YARN-7346.04-incremental.patch, YARN-7346.04.patch, YARN-7346.05.patch, 
> YARN-7346.06.patch, YARN-7346.07.patch, YARN-7346.08-incremental.patch, 
> YARN-7346.08.patch, YARN-7346.09.patch, YARN-7346.10.patch, 
> YARN-7346.11.patch, YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, 
> YARN-7581.prelim.patch, 
> hadoop-yarn-server-timelineservice-hbase-server-1-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-1-javadoc-report.txt, 
> hadoop-yarn-server-timelineservice-hbase-server-2-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-2-javadoc-report.txt
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> [https://pastebin.com/Ms4jYEVB]
> This issue is to fix the compilation errors.
> The scope of the Jira is to add a profile to allow optional compilation for 
> ATSv2 with HBase2.0. The default compilation for trunk will still be for 
> hbase 1.2.6. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7346) Add a profile to allow optional compilation for ATSv2 with HBase-2.0

2018-03-05 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387280#comment-16387280
 ] 

Rohith Sharma K S commented on YARN-7346:
-

committing shortly

> Add a profile to allow optional compilation for ATSv2 with HBase-2.0
> 
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.02.patch, YARN-7346.03-incremental.patch, YARN-7346.03.patch, 
> YARN-7346.04-incremental.patch, YARN-7346.04.patch, YARN-7346.05.patch, 
> YARN-7346.06.patch, YARN-7346.07.patch, YARN-7346.08-incremental.patch, 
> YARN-7346.08.patch, YARN-7346.09.patch, YARN-7346.10.patch, 
> YARN-7346.11.patch, YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, 
> YARN-7581.prelim.patch, 
> hadoop-yarn-server-timelineservice-hbase-server-1-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-1-javadoc-report.txt, 
> hadoop-yarn-server-timelineservice-hbase-server-2-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-2-javadoc-report.txt
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> [https://pastebin.com/Ms4jYEVB]
> This issue is to fix the compilation errors.
> The scope of the Jira is to add a profile to allow optional compilation for 
> ATSv2 with HBase2.0. The default compilation for trunk will still be for 
> hbase 1.2.6. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-03-05 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: hadoop-2.7.2-gpu-port.patch

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2-gpu-port.patch, hadoop-2.7.2-gpu.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7657) Queue Mapping could provide options to provide 'user' specific auto-created queues under a specified group parent queue

2018-03-05 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387203#comment-16387203
 ] 

Suma Shivaprasad commented on YARN-7657:


Thanks [~Zian Chen] for reviewing the patch. Reverted changes for timeout 
removal. 

[~wangda] Can you pls review and commit if it looks okay?

> Queue Mapping could provide options to provide 'user' specific auto-created 
> queues under a specified group parent queue
> ---
>
> Key: YARN-7657
> URL: https://issues.apache.org/jira/browse/YARN-7657
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7657.1.patch, YARN-7657.2.patch
>
>
> Current Queue-Mapping only provides %user as an option for 'user' specific 
> queues as u:%user:%user. We can also support %user with group as 
> 'g:makerting-group:marketing.%user'  and user specific queues can be 
> automatically created under a group queue in this case.
> cc [~leftnoteasy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-03-05 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: (was: hadoop-2.7.2-gpu-port.patch)

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2-gpu.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7657) Queue Mapping could provide options to provide 'user' specific auto-created queues under a specified group parent queue

2018-03-05 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-7657:
---
Attachment: YARN-7657.2.patch

> Queue Mapping could provide options to provide 'user' specific auto-created 
> queues under a specified group parent queue
> ---
>
> Key: YARN-7657
> URL: https://issues.apache.org/jira/browse/YARN-7657
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7657.1.patch, YARN-7657.2.patch
>
>
> Current Queue-Mapping only provides %user as an option for 'user' specific 
> queues as u:%user:%user. We can also support %user with group as 
> 'g:makerting-group:marketing.%user'  and user specific queues can be 
> automatically created under a group queue in this case.
> cc [~leftnoteasy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7626) Allow regular expression matching in container-executor.cfg for devices and named docker volumes mount

2018-03-05 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387169#comment-16387169
 ] 

Zian Chen commented on YARN-7626:
-

[~leftnoteasy] , I think after changing the style issue according to Miklos's 
latest comments, the latest patch should be good. Could you help me do a final 
review and commit the patch if no other issue? Really appreciate your help!

[~miklos.szeg...@cloudera.com] Really appreciates your help with detailed 
suggestions for the code changes!

> Allow regular expression matching in container-executor.cfg for devices and 
> named docker volumes mount
> --
>
> Key: YARN-7626
> URL: https://issues.apache.org/jira/browse/YARN-7626
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7626.001.patch, YARN-7626.002.patch, 
> YARN-7626.003.patch, YARN-7626.004.patch, YARN-7626.005.patch, 
> YARN-7626.006.patch, YARN-7626.007.patch, YARN-7626.008.patch, 
> YARN-7626.009.patch, YARN-7626.010.patch, YARN-7626.011.patch
>
>
> Currently when we config some of the GPU devices related fields (like ) in 
> container-executor.cfg, these fields are generated based on different driver 
> versions or GPU device names. We want to enable regular expression matching 
> so that user don't need to manually set up these fields when config 
> container-executor.cfg,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7626) Allow regular expression matching in container-executor.cfg for devices and named docker volumes mount

2018-03-05 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387123#comment-16387123
 ] 

genericqa commented on YARN-7626:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
26m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
55s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-7626 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913126/YARN-7626.011.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux fadf7a03e941 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4971276 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19894/testReport/ |
| Max. process+thread count | 435 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19894/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Allow regular expression matching in container-executor.cfg for devices and 
> named docker volumes mount
> --
>
> Key: YARN-7626
> URL: https://issues.apache.org/jira/browse/YARN-7626
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7626.001.patch, YARN-7626.002.patch, 
> YARN-7626.003.patch, YARN-7626.004.patch, YARN-7626.005.patch, 
> YARN-7626.006.patch, YARN-7626.007.patch, YARN-7626.008.patch, 
> YARN-7626.009.patch, YARN-7626.010.patch, YARN-7626.011.patch
>
>
> Currently when we config some of the GPU devices

[jira] [Commented] (YARN-7891) LogAggregationIndexedFileController should support HAR file

2018-03-05 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387117#comment-16387117
 ] 

Xuan Gong commented on YARN-7891:
-

[~leftnoteasy]

Yes, that is correct. As I mentioned previously, we do have the corner case, 
but we will fix it on YARN-7952

> LogAggregationIndexedFileController should support HAR file
> ---
>
> Key: YARN-7891
> URL: https://issues.apache.org/jira/browse/YARN-7891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7891.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-05 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387070#comment-16387070
 ] 

genericqa commented on YARN-8000:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
58s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 18s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications: The patch generated 2 
new + 19 unchanged - 1 fixed = 21 total (was 20) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
46s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
31s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8000 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913124/YARN-8000.003.patch |
| Optional Tests |  asflicense  mvnsite  compile  javac  javadoc  mvninstall  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a8eb7e4fdaf2 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4971276 |
| maven |

[jira] [Updated] (YARN-7626) Allow regular expression matching in container-executor.cfg for devices and named docker volumes mount

2018-03-05 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-7626:

Attachment: YARN-7626.011.patch

> Allow regular expression matching in container-executor.cfg for devices and 
> named docker volumes mount
> --
>
> Key: YARN-7626
> URL: https://issues.apache.org/jira/browse/YARN-7626
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7626.001.patch, YARN-7626.002.patch, 
> YARN-7626.003.patch, YARN-7626.004.patch, YARN-7626.005.patch, 
> YARN-7626.006.patch, YARN-7626.007.patch, YARN-7626.008.patch, 
> YARN-7626.009.patch, YARN-7626.010.patch, YARN-7626.011.patch
>
>
> Currently when we config some of the GPU devices related fields (like ) in 
> container-executor.cfg, these fields are generated based on different driver 
> versions or GPU device names. We want to enable regular expression matching 
> so that user don't need to manually set up these fields when config 
> container-executor.cfg,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-05 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387041#comment-16387041
 ] 

genericqa commented on YARN-8000:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications: The patch generated 1 
new + 20 unchanged - 0 fixed = 21 total (was 20) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
37s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8000 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913122/YARN-8000.002.patch |
| Optional Tests |  asflicense  mvnsite  compile  javac  javadoc  mvninstall  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9e7956321b9e 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4971276 |
| maven |

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2018-03-05 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387021#comment-16387021
 ] 

Miklos Szegedi commented on YARN-5764:
--

Thank you, [~devaraj.k] for the updated patch.
{code:java}
3599public static final String NM_NUMA_AWARENESS_NODE_MEMORY = NM_PREFIX
3600+ "numa-awareness..memory";
3601public static final String NM_NUMA_AWARENESS_NODE_CPUS = NM_PREFIX
3602+ "numa-awareness..cpus";{code}
These two lines are no-op, they can probably be omitted.
{code:java}
yarn.nodemanager.numa-awareness.1.memory
{code}
Optional: Is there an example of a NUMA architecture of assymetric 
architecture. It might make sense in the future to define nodes once and 
specify a multiplier, so that we can make the configuration easier.
{code:java}
145 String[] args = new String[] {"numactl", "--hardware"};{code}
This should be {{/usr/bin/numactl}} for security reasons. In fact should not it 
use the configured numactl path?
I think {{recoverCpus}} and {{recoverMemory}} can be eliminated. You could just 
create a Resource object and use assignResources.
{code}
213 NumaResourceAllocation numaNode = allocate(containerId, resource);
{code}
This is a little bit misleading. Allocate may return multiple allocations on 
multiple nodes not just a single numaNode.
I have a question. {{recoverNumaResource}} reallocates the resources based on 
the registered values. Where are those resources released? It looks like 
testRecoverNumaResource() does not test a container allocation, release and 
then relaunch cycle but the opposite direction. What is the reason for that?

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch, YARN-5764-v4.patch, YARN-5764-v5.patch, 
> YARN-5764-v6.patch, YARN-5764-v7.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8002) Support NOT_SELF and ALL namespace types for allocation tag

2018-03-05 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8002:
--
Description: 
This is a continua task after YARN-7972, YARN-7972 adds support to specify tags 
with namespace SELF and APP_ID, like following
 * self/
 * app-id//

this task is to track the work to support 2 of remaining namespace types (we'll 
support app-label later),
 * not-self/
 * all/

this will require a bit refactoring in {{AllocationTagsManager}} as it needs to 
do some proper aggregation on tags for multiple apps.

 

 

  was:
This is a continua task after YARN-7972, after YARN-7972, tags can be specified 
with namespace SELF and APP_ID, like following
 * self/
 * app-id//

this task is to track the work to support 2 of remaining namespace types (we'll 
support app-label later),
 * not-self/
 * all/

this will require a bit refactoring in \{{AllocationTagsManager}} as it needs 
to do some proper aggregation on tags for multiple apps.

 

 


> Support NOT_SELF and ALL namespace types for allocation tag
> ---
>
> Key: YARN-8002
> URL: https://issues.apache.org/jira/browse/YARN-8002
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> This is a continua task after YARN-7972, YARN-7972 adds support to specify 
> tags with namespace SELF and APP_ID, like following
>  * self/
>  * app-id//
> this task is to track the work to support 2 of remaining namespace types 
> (we'll support app-label later),
>  * not-self/
>  * all/
> this will require a bit refactoring in {{AllocationTagsManager}} as it needs 
> to do some proper aggregation on tags for multiple apps.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8002) Support NOT_SELF and ALL namespace types for allocation tag

2018-03-05 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8002:
--
Description: 
This is a continua task after YARN-7972, YARN-7972 adds support to specify tags 
with namespace SELF and APP_ID, like following
 * self/
 * app-id//

this task is to track the work to support 2 of remaining namespace types 
*NOT_SELF* & *ALL* (we'll support app-label later),
 * not-self/
 * all/

this will require a bit refactoring in {{AllocationTagsManager}} as it needs to 
do some proper aggregation on tags for multiple apps.

 

 

  was:
This is a continua task after YARN-7972, YARN-7972 adds support to specify tags 
with namespace SELF and APP_ID, like following
 * self/
 * app-id//

this task is to track the work to support 2 of remaining namespace types (we'll 
support app-label later),
 * not-self/
 * all/

this will require a bit refactoring in {{AllocationTagsManager}} as it needs to 
do some proper aggregation on tags for multiple apps.

 

 


> Support NOT_SELF and ALL namespace types for allocation tag
> ---
>
> Key: YARN-8002
> URL: https://issues.apache.org/jira/browse/YARN-8002
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> This is a continua task after YARN-7972, YARN-7972 adds support to specify 
> tags with namespace SELF and APP_ID, like following
>  * self/
>  * app-id//
> this task is to track the work to support 2 of remaining namespace types 
> *NOT_SELF* & *ALL* (we'll support app-label later),
>  * not-self/
>  * all/
> this will require a bit refactoring in {{AllocationTagsManager}} as it needs 
> to do some proper aggregation on tags for multiple apps.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-05 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387000#comment-16387000
 ] 

Gour Saha commented on YARN-8000:
-

+1 for 003 patch. /cc [~billie.rinaldi]

> Yarn Service: component instance name shows up as component name in container 
> record 
> -
>
> Key: YARN-8000
> URL: https://issues.apache.org/jira/browse/YARN-8000
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8000.001.patch, YARN-8000.002.patch, 
> YARN-8000.003.patch
>
>
> Yarn Service: component instance name shows up as component name in container 
> record 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8002) Support NOT_SELF and ALL namespace types for allocation tag

2018-03-05 Thread Weiwei Yang (JIRA)

Weiwei Yang created YARN-8002:
-

 Summary: Support NOT_SELF and ALL namespace types for allocation 
tag
 Key: YARN-8002
 URL: https://issues.apache.org/jira/browse/YARN-8002
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Weiwei Yang
Assignee: Weiwei Yang


This is a continua task after YARN-7972, after YARN-7972, tags can be specified 
with namespace SELF and APP_ID, like following
 * self/
 * app-id//

this task is to track the work to support 2 of remaining namespace types (we'll 
support app-label later),
 * not-self/
 * all/

this will require a bit refactoring in \{{AllocationTagsManager}} as it needs 
to do some proper aggregation on tags for multiple apps.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-05 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8000:

Attachment: YARN-8000.003.patch

> Yarn Service: component instance name shows up as component name in container 
> record 
> -
>
> Key: YARN-8000
> URL: https://issues.apache.org/jira/browse/YARN-8000
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8000.001.patch, YARN-8000.002.patch, 
> YARN-8000.003.patch
>
>
> Yarn Service: component instance name shows up as component name in container 
> record 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-05 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386989#comment-16386989
 ] 

Gour Saha commented on YARN-8000:
-

Sorry I missed this -

In the description of {{component_instance_name}} in the Swagger definition 
(YARN-Simplified-V1-API-Layer-For-Services.yaml) we have this - "Name of the 
component instance that this container instance belongs to."

In the comments in Container.java, it says "Name of the component that this 
container instance belongs to."

Let's change the 2 comments in Container.java to match that in the yaml file to 
avoid ambiguity - so we should say "component instance" instead of "component".

> Yarn Service: component instance name shows up as component name in container 
> record 
> -
>
> Key: YARN-8000
> URL: https://issues.apache.org/jira/browse/YARN-8000
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8000.001.patch, YARN-8000.002.patch
>
>
> Yarn Service: component instance name shows up as component name in container 
> record 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8001) Newly created Yarn application ID lost after RM failover

2018-03-05 Thread shanyu zhao (JIRA)

shanyu zhao created YARN-8001:
-

 Summary: Newly created Yarn application ID lost after RM failover
 Key: YARN-8001
 URL: https://issues.apache.org/jira/browse/YARN-8001
 Project: Hadoop YARN
  Issue Type: Bug
  Components: RM
Affects Versions: 2.9.0, 2.7.3
Reporter: shanyu zhao


I’ve seen a problem in Hadoop 2.7.3 where the newly submitted yarn application 
was lost after a RM failover. It looks like when handling Application 
submission, RM does not write it to the state-store (We are using zookeeper 
based state store) immediately before it respond to the client. But later it 
failed over to another RM and all write call to the state store failed. The new 
RM recovers state from the state-store, and this app is lost. 

 

The symptom is error message at client side claiming a previously submitted 
application ID does not exist:

2018-02-22 14:54:50,258 [JobControl] WARN  
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider - 
Invocation returned exception on [rm1] : 
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_1519310222933_0160' doesn't exist in RM. Please check that 
the job submission was successful.

 

This is a timeline excerpted from the resource manager logs:

2018-02-22 14:54:06.7685260    headnode1    Storing application with id 
application_1519310222933_0160

2018-02-22 14:54:06.7685660    headnode1  
application_1519310222933_0160 State change from NEW to NEW_SAVING

2018-02-22 14:54:17.8924760    headnode1    Transitioning to standby state

2018-02-22 14:54:30.3951160    headnode0    Transitioning to active state



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7972) Support inter-app placement constraints for allocation tags by application ID

2018-03-05 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386963#comment-16386963
 ] 

Weiwei Yang commented on YARN-7972:
---

Thanks for the review [~asuresh], [~leftnoteasy]!

> Support inter-app placement constraints for allocation tags by application ID
> -
>
> Key: YARN-7972
> URL: https://issues.apache.org/jira/browse/YARN-7972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7972.001.patch, YARN-7972.002.patch, 
> YARN-7972.003.patch, YARN-7972.004.patch, YARN-7972.005.patch, 
> YARN-7972.006.patch, YARN-7972.007.patch
>
>
> Per discussion in [this 
> comment|https://issues.apache.org/jira/browse/YARN-6599focusedCommentId=16319662=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16319662]
>  in  YARN-6599, we need to support inter-app PC for allocation tags.
> This will help to do better placement when dealing with potential competing 
> resource applications, e.g don't place two tensorflow workers from two 
> different applications on one same node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-05 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8000:

Attachment: YARN-8000.002.patch

> Yarn Service: component instance name shows up as component name in container 
> record 
> -
>
> Key: YARN-8000
> URL: https://issues.apache.org/jira/browse/YARN-8000
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8000.001.patch, YARN-8000.002.patch
>
>
> Yarn Service: component instance name shows up as component name in container 
> record 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7935) Expose container's hostname to applications running within the docker container

2018-03-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386946#comment-16386946
 ] 

Wangda Tan commented on YARN-7935:
--

Thanks [~suma.shivaprasad], the latest patch looks good, will commit the patch 
by tomorrow if no objections.

[~jlowe], [~tgraves], do you want to take another look at the patch before I 
commit?

> Expose container's hostname to applications running within the docker 
> container
> ---
>
> Key: YARN-7935
> URL: https://issues.apache.org/jira/browse/YARN-7935
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7935.1.patch, YARN-7935.2.patch, YARN-7935.3.patch
>
>
> Some applications have a need to bind to the container's hostname (like 
> Spark) which is different from the NodeManager's hostname(NM_HOST which is 
> available as an env during container launch) when launched through Docker 
> runtime. The container's hostname can be exposed to applications via an env 
> CONTAINER_HOSTNAME. Another potential candidate is the container's IP but 
> this can be addressed in a separate jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-05 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386939#comment-16386939
 ] 

Gour Saha commented on YARN-8000:
-

[~csingh], the patch looks good. Just one comment -

In the builder change the param name to componentInstanceName, instead of 
componentName as shown below -
{code:java}
public Container componentInstanceName(String componentInstanceName) {
 this.componentInstanceName = componentInstanceName;

{code}

> Yarn Service: component instance name shows up as component name in container 
> record 
> -
>
> Key: YARN-8000
> URL: https://issues.apache.org/jira/browse/YARN-8000
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8000.001.patch
>
>
> Yarn Service: component instance name shows up as component name in container 
> record 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7626) Allow regular expression matching in container-executor.cfg for devices and named docker volumes mount

2018-03-05 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386920#comment-16386920
 ] 

Zian Chen commented on YARN-7626:
-

Sure [~miklos.szeg...@cloudera.com] totally agree with that style change. Will 
update the patch shortly. Thank you [~leftnoteasy] for the review as well.

> Allow regular expression matching in container-executor.cfg for devices and 
> named docker volumes mount
> --
>
> Key: YARN-7626
> URL: https://issues.apache.org/jira/browse/YARN-7626
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7626.001.patch, YARN-7626.002.patch, 
> YARN-7626.003.patch, YARN-7626.004.patch, YARN-7626.005.patch, 
> YARN-7626.006.patch, YARN-7626.007.patch, YARN-7626.008.patch, 
> YARN-7626.009.patch, YARN-7626.010.patch
>
>
> Currently when we config some of the GPU devices related fields (like ) in 
> container-executor.cfg, these fields are generated based on different driver 
> versions or GPU device names. We want to enable regular expression matching 
> so that user don't need to manually set up these fields when config 
> container-executor.cfg,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-05 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386894#comment-16386894
 ] 

genericqa commented on YARN-7999:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
26m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 46s{color} | 
{color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 8 new + 0 unchanged - 0 fixed = 8 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
39s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-7999 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913088/YARN-7999.001.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 970e107ec735 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 245751f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| cc | 
https://builds.apache.org/job/PreCommit-YARN-Build/19891/artifact/out/diff-compile-cc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19891/testReport/ |
| Max. process+thread count | 410 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19891/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>

[jira] [Commented] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-05 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386817#comment-16386817
 ] 

Chandni Singh commented on YARN-8000:
-

[~gsaha] could you please help review?

> Yarn Service: component instance name shows up as component name in container 
> record 
> -
>
> Key: YARN-8000
> URL: https://issues.apache.org/jira/browse/YARN-8000
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8000.001.patch
>
>
> Yarn Service: component instance name shows up as component name in container 
> record 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-05 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-7999:


Assignee: Jason Lowe  (was: Shane Kumpf)
Target Version/s: 3.1.0

Got around to testing this patch, and it fixes the issue.  Replicated Eric's 
error by manually removing the filecache directory before a docker launch, and 
verified that applying the patch allows the docker command to run even if the 
filecache directory is missing before the docker launch request.

[~shaneku...@gmail.com] feel free to take this back over if you have a cleaner 
solution.

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7999.001.patch
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-05 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386803#comment-16386803
 ] 

genericqa commented on YARN-8000:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 41s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
33s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 47s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8000 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913103/YARN-8000.001.patch |
| Optional Tests |  asflicense  mvnsite  compile  javac  javadoc  mvninstall  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e67d703b72d2 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 245751f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results |

[jira] [Updated] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-05 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8000:

Attachment: YARN-8000.001.patch

> Yarn Service: component instance name shows up as component name in container 
> record 
> -
>
> Key: YARN-8000
> URL: https://issues.apache.org/jira/browse/YARN-8000
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8000.001.patch
>
>
> Yarn Service: component instance name shows up as component name in container 
> record 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7626) Allow regular expression matching in container-executor.cfg for devices and named docker volumes mount

2018-03-05 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386738#comment-16386738
 ] 

Miklos Szegedi commented on YARN-7626:
--

Thank you for the patch [~Zian Chen] and for the review [~leftnoteasy].

Optional: I have one style issue with the latest patch. When you refer to 6 in 
your patch like the one below, you should probably do sizeof("regex:"). This 
helps to better understand the code and it is more future proof.
{code:java}
132 return is_volume_name(requested) && (execute_regex_match(pattern + 6, 
requested) == 0);{code}

> Allow regular expression matching in container-executor.cfg for devices and 
> named docker volumes mount
> --
>
> Key: YARN-7626
> URL: https://issues.apache.org/jira/browse/YARN-7626
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-7626.001.patch, YARN-7626.002.patch, 
> YARN-7626.003.patch, YARN-7626.004.patch, YARN-7626.005.patch, 
> YARN-7626.006.patch, YARN-7626.007.patch, YARN-7626.008.patch, 
> YARN-7626.009.patch, YARN-7626.010.patch
>
>
> Currently when we config some of the GPU devices related fields (like ) in 
> container-executor.cfg, these fields are generated based on different driver 
> versions or GPU device names. We want to enable regular expression matching 
> so that user don't need to manually set up these fields when config 
> container-executor.cfg,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-05 Thread Chandni Singh (JIRA)

Chandni Singh created YARN-8000:
---

 Summary: Yarn Service: component instance name shows up as 
component name in container record 
 Key: YARN-8000
 URL: https://issues.apache.org/jira/browse/YARN-8000
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chandni Singh
Assignee: Chandni Singh


Yarn Service: component instance name shows up as component name in container 
record 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7952) Find a way to persist the log aggregation status

2018-03-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386700#comment-16386700
 ] 

Wangda Tan commented on YARN-7952:
--

Thanks [~xgong], 

It might be better to update default value in a separate JIRA since it is not 
directly related to this one.

1) nmLogAggregationStatusTracker should be part of NodeManager.java instead of 
CM.
2) NMLogAggregationStatusTracker:
2.1 It looks like pullLock should be a readLock and updateLocker should be a 
writeLock since pull doesn't change internal data, correct?

2.2 It's better to rename pullLocker/updateLocker to read/writeLock for 
readability.

2.3 For overall workflow of this class, IMO it should be:
a. When log aggregation starts, application will be added to 
{{NMLogAggregationStatusTracker}} with timestamp.
b. When RM acknowledges application finished, application will be removed from 
the {{NMLogAggregationStatusTracker}}
c. If configured timeout reaches, application will be removed from the 
{{NMLogAggregationStatusTracker}}.

I can see c. is handled by {{rollLogAggregationStatus}} and a. is handled by 
{{updateLogAggregationStatus}}, but not sure about b. When I look at logic 
below, I'm not sure if it works:

{code}
  if (currentTime - updateTime > rollingInterval) {
LOG.warn("Ignore the log aggregation status update request "
+ "for the application:" + appId + ". The log aggregation 
status"
+ " update time is " + updateTime + " while the request process 
"
+ "time is " + currentTime + ".");
return;
  }
{code}

Since the currentTime always almost equal to updateTime. The if statement 
should be false, correct? And I think if application is null in NM side, we 
should not add it to the track, correct?

Instead of this, should we explicitly notify {{NMLogAggregationStatusTracker}} 
in {{ApplicationImpl.AppLogsAggregatedTransition}}? (It is true that 
{{ApplicationImpl.AppLogsAggregatedTransition}} means RM acks the log is 
finished?)

2.4 I'm not sure if we should ever change the {{lastModifiedTime}}. IIUC, the 
RM will timeout log aggregation status after configured timeout. What's the 
purpose of updating the {{lastModifiedTime}}? Will this cause OOM issue 
potentially? 

2.5 updateLogAggregationStatus: 
- the {{long updateTime}} parameter is not necessary, can always use 
System.currentTime instead.

3) {{LogAggregationTrakcer}}
- {{LogAggregationTrakcer}} => AppLogAggregationStatusForRMRecovery (it's not 
tracker since it is a passive status, and it's better to emphasize purpose of 
this class)
- {{getLastModifiedTime}} should be {{logAggregationStartedTime}} if you agree 
with 2.4

4) Comment of YarnConfiguration.LOG_AGGREGATION_STATUS_TIME_OUT_MS should be 
updated, it will be consumed by NM as well.

Will include more another detailed scan in the next review.

> Find a way to persist the log aggregation status
> 
>
> Key: YARN-7952
> URL: https://issues.apache.org/jira/browse/YARN-7952
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, YARN-7952.2.patch
>
>
> In MAPREDUCE-6415, we have created a CLI to har the aggregated logs, and In 
> YARN-4946: RM should write out Aggregated Log Completion file flag next to 
> logs, we have a discussion on how we can get the log aggregation status: make 
> a client call to RM or get it directly from the Distributed file system(HDFS).
> No matter which approach we would like to choose, we need to figure out a way 
> to persist the log aggregation status first. This ticket is used to track the 
> working progress for this purpose.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7736) Fix itemization in YARN federation document

2018-03-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386682#comment-16386682
 ] 

Hudson commented on YARN-7736:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13773 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13773/])
YARN-7736. Fix itemization in YARN federation document (aajisaka: rev 
245751ffdc4229715a0c031f57f20748ed16d8a6)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/Federation.md


> Fix itemization in YARN federation document
> ---
>
> Key: YARN-7736
> URL: https://issues.apache.org/jira/browse/YARN-7736
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Sen Zhao
>Priority: Minor
>  Labels: newbie
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2
>
> Attachments: YARN-7736.001.patch
>
>
> https://hadoop.apache.org/docs/r3.0.0/hadoop-yarn/hadoop-yarn-site/Federation.html
> {noformat}
> Assumptions:
> * We assume reasonably good connectivity across sub-clusters (e.g., we are 
> not looking to federate across DC yet, though future investigations of this 
> are not excluded).
> * We rely on HDFS federation (or equivalently scalable DFS solutions) to take 
> care of scalability of the store side.
> {noformat}
> Blank line should be inserted before itemization to render correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7405) [GQ] Bias container allocations based on global view

2018-03-05 Thread Subru Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan reassigned YARN-7405:


Assignee: Arun Suresh  (was: Subru Krishnan)

> [GQ] Bias container allocations based on global view
> 
>
> Key: YARN-7405
> URL: https://issues.apache.org/jira/browse/YARN-7405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Carlo Curino
>Assignee: Arun Suresh
>Priority: Major
>
> Each RM in a federation should bias its local allocations of containers based 
> on the global over/under utilization of queues. As part of this the local RM 
> should account for the work that other RMs will be doing in between the 
> updates we receive via the heartbeats of YARN-7404 (the mechanics used for 
> synchronization).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7972) Support inter-app placement constraints for allocation tags by application ID

2018-03-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386645#comment-16386645
 ] 

Hudson commented on YARN-7972:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13772 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13772/])
YARN-7972. Support inter-app placement constraints for allocation tags (arun 
suresh: rev 1054b48c27f3158110bd0512afecded36eecb8ad)
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/AllocationTagNamespaceType.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/TargetApplications.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/AllocationTagNamespace.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/constraint/InvalidAllocationTagsQueryException.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/constraint/PlacementConstraintsUtil.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Evaluable.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/constraint/TestPlacementConstraintsUtil.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/constraint/TestAllocationTagsNamespace.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/constraint/AllocationTagsManager.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/AllocationTags.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/SingleConstraintAppPlacementAllocator.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/InvalidAllocationTagException.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/resource/PlacementConstraints.java


> Support inter-app placement constraints for allocation tags by application ID
> -
>
> Key: YARN-7972
> URL: https://issues.apache.org/jira/browse/YARN-7972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7972.001.patch, YARN-7972.002.patch, 
> YARN-7972.003.patch, YARN-7972.004.patch, YARN-7972.005.patch, 
> YARN-7972.006.patch, YARN-7972.007.patch
>
>
> Per discussion in [this 
> comment|https://issues.apache.org/jira/browse/YARN-6599focusedCommentId=16319662=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16319662]
>  in  YARN-6599, we need to support inter-app PC for allocation tags.
> This will help to do better placement when dealing with potential competing 
> resource applications, e.g don't place two tensorflow workers from two 
> different applications on one same node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7891) LogAggregationIndexedFileController should support HAR file

2018-03-05 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386633#comment-16386633
 ] 

Wangda Tan commented on YARN-7891:
--

[~xgong], 

Thanks for explanation, so IIUC, logic of the method {{getAllNodeFiles}} is:
1) If not HAR file present, returns all {{nodeFiles}}
2) If any HAR file present, only return all files under the existed HAR file.

If HAR file is treated as final log aggregation result, this logic should be 
correct to me. (From the MAPREDUCE-6415, it looks final):
{code} .
235   @VisibleForTesting
236   void findAggregatedApps() throws IOException, YarnException {
237 YarnClient client = YarnClient.createYarnClient();
238 try {
239   client.init(getConf());
240   client.start();
241   List reports = client.getApplications();
242   for (ApplicationReport report : reports) {
243 LogAggregationStatus aggStatus = 
report.getLogAggregationStatus();
244 if (aggStatus.equals(LogAggregationStatus.SUCCEEDED) ||
245 aggStatus.equals(LogAggregationStatus.FAILED)) {
246   eligibleApplications.add(report);
247 }
248   }
249 } finally {
250   if (client != null) {
251 client.stop();
252   }
253 }
254   }
{code}

If you can confirm my understanding this correct, the patch looks good to me.


> LogAggregationIndexedFileController should support HAR file
> ---
>
> Key: YARN-7891
> URL: https://issues.apache.org/jira/browse/YARN-7891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7891.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7736) Fix itemization in YARN federation document

2018-03-05 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386624#comment-16386624
 ] 

Akira Ajisaka commented on YARN-7736:
-

+1

> Fix itemization in YARN federation document
> ---
>
> Key: YARN-7736
> URL: https://issues.apache.org/jira/browse/YARN-7736
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Sen Zhao
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-7736.001.patch
>
>
> https://hadoop.apache.org/docs/r3.0.0/hadoop-yarn/hadoop-yarn-site/Federation.html
> {noformat}
> Assumptions:
> * We assume reasonably good connectivity across sub-clusters (e.g., we are 
> not looking to federate across DC yet, though future investigations of this 
> are not excluded).
> * We rely on HDFS federation (or equivalently scalable DFS solutions) to take 
> care of scalability of the store side.
> {noformat}
> Blank line should be inserted before itemization to render correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7988) Refactor FSNodeLabelStore code for attributes store support

2018-03-05 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386555#comment-16386555
 ] 

Bibin A Chundatt commented on YARN-7988:


[~Naganarasimha]/[~sunil.gov...@gmail.com]/[~cheersyang]
 Could you review the latest patch. FSEditLogOp format is followed for current 
implementation.


> Refactor FSNodeLabelStore code for attributes store support
> ---
>
> Key: YARN-7988
> URL: https://issues.apache.org/jira/browse/YARN-7988
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-7988-YARN-3409.002.patch, 
> YARN-7988-YARN-3409.003.patch, YARN-7988-YARN-3409.004.patch, 
> YARN-7988.001.patch
>
>
> # Abstract out file FileSystemStore operation
> # Define EditLog Operartions  and Mirror operation
> # Support compatibility with old nodelabel store



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-05 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386517#comment-16386517
 ] 

Jason Lowe commented on YARN-7999:
--

I haven't had a chance to test this at all yet, but here's a patch that should 
ensure the user filecache directory is present when launching Docker 
containers.  The main drawback to this approach is that there is now two places 
in the code that could setup the user filecache directory (one in the container 
localizer and one in the container executor).

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7999.001.patch
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-05 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-7999:
-
Attachment: YARN-7999.001.patch

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7999.001.patch
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-05 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-7999:
---

Assignee: Shane Kumpf

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Shane Kumpf
>Priority: Major
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7972) Support inter-app placement constraints for allocation tags by application ID

2018-03-05 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386452#comment-16386452
 ] 

Arun Suresh commented on YARN-7972:
---

Thanks for the update [~cheersyang].
+1 to the latest version - it definitely looks cleaner, and the testcases 
demonstrate the API usage nicely.

> Support inter-app placement constraints for allocation tags by application ID
> -
>
> Key: YARN-7972
> URL: https://issues.apache.org/jira/browse/YARN-7972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-7972.001.patch, YARN-7972.002.patch, 
> YARN-7972.003.patch, YARN-7972.004.patch, YARN-7972.005.patch, 
> YARN-7972.006.patch, YARN-7972.007.patch
>
>
> Per discussion in [this 
> comment|https://issues.apache.org/jira/browse/YARN-6599focusedCommentId=16319662=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16319662]
>  in  YARN-6599, we need to support inter-app PC for allocation tags.
> This will help to do better placement when dealing with potential competing 
> resource applications, e.g don't place two tensorflow workers from two 
> different applications on one same node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7915) Trusted image log message repeated multiple times

2018-03-05 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386441#comment-16386441
 ] 

Shane Kumpf commented on YARN-7915:
---

Thanks for the review [~ebadger] and thank you [~billie.rinaldi] for the 
review/commit!

> Trusted image log message repeated multiple times 
> --
>
> Key: YARN-7915
> URL: https://issues.apache.org/jira/browse/YARN-7915
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Eric Badger
>Assignee: Shane Kumpf
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7915.001.patch
>
>
> Everytime we call {{check_trusted_image()}} we get a log message saying 
> whether the image is trusted or not. In the case where it is trusted, the log 
> message will get printed once for every call to the function. It's 
> unnecessarily repetitive. I'm not really sure we need the log at all if the 
> image is trusted. Maybe only log if it isn't trusted
> {noformat}
> Application application_1518201929288_0010 failed 3 times due to AM Container 
> for appattempt_1518201929288_0010_03 exited with exitCode: 1
> Failing this attempt.Diagnostics: [2018-02-09 20:32:09.391]Exception from 
> container-launch.
> Container id: container_1518201929288_0010_03_01
> Exit code: 1
> Exception message: image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> Docker container exit code was not zero: 1
> Unable to read from docker logs(ferror, feof): 0 1
> Shell output: main : command provided 4
> main : run as user is ebadger
> main : requested yarn user is ebadger
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Launching docker container...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Moved] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-05 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang moved HADOOP-15284 to YARN-7999:
--

Affects Version/s: (was: 3.1.0)
   3.1.0
  Key: YARN-7999  (was: HADOOP-15284)
  Project: Hadoop YARN  (was: Hadoop Common)

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Major
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7915) Trusted image log message repeated multiple times

2018-03-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386403#comment-16386403
 ] 

Hudson commented on YARN-7915:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13770 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13770/])
YARN-7915. Trusted image log message repeated multiple times. (billie: rev 
628be58a4ca7df33d92b7f1e5d064ab16085e81a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c


> Trusted image log message repeated multiple times 
> --
>
> Key: YARN-7915
> URL: https://issues.apache.org/jira/browse/YARN-7915
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Eric Badger
>Assignee: Shane Kumpf
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7915.001.patch
>
>
> Everytime we call {{check_trusted_image()}} we get a log message saying 
> whether the image is trusted or not. In the case where it is trusted, the log 
> message will get printed once for every call to the function. It's 
> unnecessarily repetitive. I'm not really sure we need the log at all if the 
> image is trusted. Maybe only log if it isn't trusted
> {noformat}
> Application application_1518201929288_0010 failed 3 times due to AM Container 
> for appattempt_1518201929288_0010_03 exited with exitCode: 1
> Failing this attempt.Diagnostics: [2018-02-09 20:32:09.391]Exception from 
> container-launch.
> Container id: container_1518201929288_0010_03_01
> Exit code: 1
> Exception message: image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> Docker container exit code was not zero: 1
> Unable to read from docker logs(ferror, feof): 0 1
> Shell output: main : command provided 4
> main : run as user is ebadger
> main : requested yarn user is ebadger
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Launching docker container...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7915) Trusted image log message repeated multiple times

2018-03-05 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-7915:
-
Affects Version/s: 3.1.0

> Trusted image log message repeated multiple times 
> --
>
> Key: YARN-7915
> URL: https://issues.apache.org/jira/browse/YARN-7915
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Eric Badger
>Assignee: Shane Kumpf
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7915.001.patch
>
>
> Everytime we call {{check_trusted_image()}} we get a log message saying 
> whether the image is trusted or not. In the case where it is trusted, the log 
> message will get printed once for every call to the function. It's 
> unnecessarily repetitive. I'm not really sure we need the log at all if the 
> image is trusted. Maybe only log if it isn't trusted
> {noformat}
> Application application_1518201929288_0010 failed 3 times due to AM Container 
> for appattempt_1518201929288_0010_03 exited with exitCode: 1
> Failing this attempt.Diagnostics: [2018-02-09 20:32:09.391]Exception from 
> container-launch.
> Container id: container_1518201929288_0010_03_01
> Exit code: 1
> Exception message: image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> Docker container exit code was not zero: 1
> Unable to read from docker logs(ferror, feof): 0 1
> Shell output: main : command provided 4
> main : run as user is ebadger
> main : requested yarn user is ebadger
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Launching docker container...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7915) Trusted image log message repeated multiple times

2018-03-05 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386383#comment-16386383
 ] 

Billie Rinaldi commented on YARN-7915:
--

+1 for patch 001. Thanks for the patch [~shaneku...@gmail.com] and for the 
review [~ebadger]!

> Trusted image log message repeated multiple times 
> --
>
> Key: YARN-7915
> URL: https://issues.apache.org/jira/browse/YARN-7915
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7915.001.patch
>
>
> Everytime we call {{check_trusted_image()}} we get a log message saying 
> whether the image is trusted or not. In the case where it is trusted, the log 
> message will get printed once for every call to the function. It's 
> unnecessarily repetitive. I'm not really sure we need the log at all if the 
> image is trusted. Maybe only log if it isn't trusted
> {noformat}
> Application application_1518201929288_0010 failed 3 times due to AM Container 
> for appattempt_1518201929288_0010_03 exited with exitCode: 1
> Failing this attempt.Diagnostics: [2018-02-09 20:32:09.391]Exception from 
> container-launch.
> Container id: container_1518201929288_0010_03_01
> Exit code: 1
> Exception message: image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> Docker container exit code was not zero: 1
> Unable to read from docker logs(ferror, feof): 0 1
> Shell output: main : command provided 4
> main : run as user is ebadger
> main : requested yarn user is ebadger
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Launching docker container...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7346) Add a profile to allow optional compilation for ATSv2 with HBase-2.0

2018-03-05 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386374#comment-16386374
 ] 

Rohith Sharma K S commented on YARN-7346:
-

+1 lgtm.. I will commit it later of today if no more objections. thanks 
[~haibochen]

> Add a profile to allow optional compilation for ATSv2 with HBase-2.0
> 
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.02.patch, YARN-7346.03-incremental.patch, YARN-7346.03.patch, 
> YARN-7346.04-incremental.patch, YARN-7346.04.patch, YARN-7346.05.patch, 
> YARN-7346.06.patch, YARN-7346.07.patch, YARN-7346.08-incremental.patch, 
> YARN-7346.08.patch, YARN-7346.09.patch, YARN-7346.10.patch, 
> YARN-7346.11.patch, YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, 
> YARN-7581.prelim.patch, 
> hadoop-yarn-server-timelineservice-hbase-server-1-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-1-javadoc-report.txt, 
> hadoop-yarn-server-timelineservice-hbase-server-2-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-2-javadoc-report.txt
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> [https://pastebin.com/Ms4jYEVB]
> This issue is to fix the compilation errors.
> The scope of the Jira is to add a profile to allow optional compilation for 
> ATSv2 with HBase2.0. The default compilation for trunk will still be for 
> hbase 1.2.6. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2018-03-05 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386373#comment-16386373
 ] 

Eric Yang commented on YARN-7221:
-

Hi [~ebadger], can you give patch 006 a try?  Thanks

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-7221.001.patch, YARN-7221.002.patch, 
> YARN-7221.003.patch, YARN-7221.004.patch, YARN-7221.005.patch, 
> YARN-7221.006.patch
>
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7346) Add a profile to allow optional compilation for ATSv2 with HBase-2.0

2018-03-05 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386342#comment-16386342
 ] 

genericqa commented on YARN-7346:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-assemblies 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests
 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
11s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-assemblies 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests
 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
19s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-assemblies in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-client in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
26s{color} | {color:green}

[jira] [Commented] (YARN-7915) Trusted image log message repeated multiple times

2018-03-05 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386316#comment-16386316
 ] 

Eric Badger commented on YARN-7915:
---

lgtm +1 (non-binding). Thanks, [~shaneku...@gmail.com]!

> Trusted image log message repeated multiple times 
> --
>
> Key: YARN-7915
> URL: https://issues.apache.org/jira/browse/YARN-7915
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7915.001.patch
>
>
> Everytime we call {{check_trusted_image()}} we get a log message saying 
> whether the image is trusted or not. In the case where it is trusted, the log 
> message will get printed once for every call to the function. It's 
> unnecessarily repetitive. I'm not really sure we need the log at all if the 
> image is trusted. Maybe only log if it isn't trusted
> {noformat}
> Application application_1518201929288_0010 failed 3 times due to AM Container 
> for appattempt_1518201929288_0010_03 exited with exitCode: 1
> Failing this attempt.Diagnostics: [2018-02-09 20:32:09.391]Exception from 
> container-launch.
> Container id: container_1518201929288_0010_03_01
> Exit code: 1
> Exception message: image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> Docker container exit code was not zero: 1
> Unable to read from docker logs(ferror, feof): 0 1
> Shell output: main : command provided 4
> main : run as user is ebadger
> main : requested yarn user is ebadger
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Launching docker container...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7996) Allow user supplied Docker client configurations with YARN native services

2018-03-05 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386270#comment-16386270
 ] 

genericqa commented on YARN-7996:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  4m 
31s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 38s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
36s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  5m 
20s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
33s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
17s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 82m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-7996 |
| JIRA Patch URL |

[jira] [Updated] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-03-05 Thread Oleksandr Shevchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Shevchenko updated YARN-7998:
---
Attachment: YARN-7998.001.patch

> RM crashes with NPE during recovering if ACL configuration was changed
> --
>
> Key: YARN-7998
> URL: https://issues.apache.org/jira/browse/YARN-7998
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Oleksandr Shevchenko
>Priority: Major
> Attachments: YARN-7998.000.patch, YARN-7998.001.patch
>
>
> RM crashes with NPE during failover because ACL configurations were changed 
> as a result we no longer have a rights to submit an application to a queue.
> Scenario:
>  # Submit an application
>  # Change ACL configuration for a queue that accepted the application so that 
> an owner of the application will no longer have a rights to submit this 
> application.
>  # Restart RM.
> As a result, we get NPE:
> 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7346) Add a profile to allow optional compilation for ATSv2 with HBase-2.0

2018-03-05 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-7346:
-
Attachment: YARN-7346.11.patch

> Add a profile to allow optional compilation for ATSv2 with HBase-2.0
> 
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.02.patch, YARN-7346.03-incremental.patch, YARN-7346.03.patch, 
> YARN-7346.04-incremental.patch, YARN-7346.04.patch, YARN-7346.05.patch, 
> YARN-7346.06.patch, YARN-7346.07.patch, YARN-7346.08-incremental.patch, 
> YARN-7346.08.patch, YARN-7346.09.patch, YARN-7346.10.patch, 
> YARN-7346.11.patch, YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, 
> YARN-7581.prelim.patch, 
> hadoop-yarn-server-timelineservice-hbase-server-1-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-1-javadoc-report.txt, 
> hadoop-yarn-server-timelineservice-hbase-server-2-findbugsXml.xml, 
> hadoop-yarn-server-timelineservice-hbase-server-2-javadoc-report.txt
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> [https://pastebin.com/Ms4jYEVB]
> This issue is to fix the compilation errors.
> The scope of the Jira is to add a profile to allow optional compilation for 
> ATSv2 with HBase2.0. The default compilation for trunk will still be for 
> hbase 1.2.6. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7996) Allow user supplied Docker client configurations with YARN native services

2018-03-05 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386144#comment-16386144
 ] 

Shane Kumpf commented on YARN-7996:
---

Attached a new patch to address the checkstyle warnings.

I also fixed a minor bug in 
{{DockerClientConfigHandler#writeDockerCredentialsToPath}} where the 
{{config.json}} would be written out even if no Docker credentials were passed 
through the CLC. The {{config.json}} was valid, but contained no auth tokens, 
so it was an unnecessary file write for each container.

> Allow user supplied Docker client configurations with YARN native services
> --
>
> Key: YARN-7996
> URL: https://issues.apache.org/jira/browse/YARN-7996
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7996.001.patch, YARN-7996.002.patch
>
>
> YARN-5428 added support to distributed shell for supplying a Docker client 
> configuration at application submission time. The auth tokens within the 
> client configuration are then used to pull images from private Docker 
> repositories/registries. Add the same support to the YARN Native Services 
> framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7996) Allow user supplied Docker client configurations with YARN native services

2018-03-05 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-7996:
--
Attachment: YARN-7996.002.patch

> Allow user supplied Docker client configurations with YARN native services
> --
>
> Key: YARN-7996
> URL: https://issues.apache.org/jira/browse/YARN-7996
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7996.001.patch, YARN-7996.002.patch
>
>
> YARN-5428 added support to distributed shell for supplying a Docker client 
> configuration at application submission time. The auth tokens within the 
> client configuration are then used to pull images from private Docker 
> repositories/registries. Add the same support to the YARN Native Services 
> framework.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7987) Docker container name(--name) needs to be DNS friendly for DNS resolution to work in user defined networks.

2018-03-05 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386113#comment-16386113
 ] 

Shane Kumpf commented on YARN-7987:
---

{quote}Created YARN-7994 to track adding support for network-alias
{quote}
Thanks, [~suma.shivaprasad]! Do you think we can close this issue or do we 
still need to explore this after YARN-7994?

> Docker container name(--name) needs to be DNS friendly for DNS resolution to 
> work in user defined networks. 
> 
>
> Key: YARN-7987
> URL: https://issues.apache.org/jira/browse/YARN-7987
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
>
> User defined networks like overlays support DNS resolution through Docker 
> Embedded DNS which needs the container name (–name parameter value in docker 
> run) to be a FQDN for container names to be resolved - Please refer 
> documentation 
> [https://docs.docker.com/v17.09/engine/userguide/networking/configure-dns/]
> However Yarn sets the container name to the container's id which is not DNS 
> friendly(eg: container_e26_1519402686002_0035_01_03) and is not a FQDN. 
> The proposal is to set a FQDN(eg: 
> ctr-e26-1519402686002-0035-01-03.domain-name) as the docker container's 
> name for containers to be able to communicate to each other via hostnames in 
> user defined networks like overlays, bridges etc. The domain name will be 
> picked up from the YARN DNS registry configuration 
> (hadoop.registry.dns.domain-name)
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7915) Trusted image log message repeated multiple times

2018-03-05 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386110#comment-16386110
 ] 

Shane Kumpf commented on YARN-7915:
---

There are no test changes as this removes a single logging statement. I believe 
this is ready for review. [~billie.rinaldi]

> Trusted image log message repeated multiple times 
> --
>
> Key: YARN-7915
> URL: https://issues.apache.org/jira/browse/YARN-7915
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7915.001.patch
>
>
> Everytime we call {{check_trusted_image()}} we get a log message saying 
> whether the image is trusted or not. In the case where it is trusted, the log 
> message will get printed once for every call to the function. It's 
> unnecessarily repetitive. I'm not really sure we need the log at all if the 
> image is trusted. Maybe only log if it isn't trusted
> {noformat}
> Application application_1518201929288_0010 failed 3 times due to AM Container 
> for appattempt_1518201929288_0010_03 exited with exitCode: 1
> Failing this attempt.Diagnostics: [2018-02-09 20:32:09.391]Exception from 
> container-launch.
> Container id: container_1518201929288_0010_03_01
> Exit code: 1
> Exception message: image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> image: foo/bar is trusted in foo registry.
> Docker container exit code was not zero: 1
> Unable to read from docker logs(ferror, feof): 0 1
> Shell output: main : command provided 4
> main : run as user is ebadger
> main : requested yarn user is ebadger
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Launching docker container...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-03-05 Thread Oleksandr Shevchenko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385971#comment-16385971
 ] 

Oleksandr Shevchenko edited comment on YARN-7998 at 3/5/18 11:41 AM:
-

RM failed with NPE during failover if FairScheduler configurations were changed.

An application was not finished yet, so, application final state = null and 
also, the last app attempt doesn't have the final state too.

2018-02-28 15:50:51,576 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1517497680557_565955 *with 2 attempts and final state = null*
2018-02-28 15:50:54,761 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1517497680557_565955_01 with *final state: 
FAILED*
2018-02-28 15:50:54,766 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1517497680557_565955_02 with *final state: 
null*

In my case, an *ACL configuration in fair-scheduler.xml was changed* as a 
result we no longer have a rights to submit this application.

In FairScheduler#addApplication() we skip it application. We do not add this 
application to the scheduler application map and send event APP_REJECTED to go 
an application to the state FAILED.
{code:java}
if (!queue.hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi) && !queue
.hasAccess(QueueACL.ADMINISTER_QUEUE, userUgi)) {
  String msg = "User " + userUgi.getUserName()
  + " cannot submit applications to queue " + queue.getName()
  + "(requested queuename is " + queueName + ")";
  LOG.info(msg);
  rmContext.getDispatcher().getEventHandler().handle(
  new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED, msg));
  return;
} 
{code}
Then we try to recovery app attempts. When we try to recovery the last app 
attempt we should check the final state of attempt and the final state of the 
application (See RMAppAttemptImpl#transition()). As I said before, application 
final state = null and also, the last app attempt doesn't have the final state 
too. So, we check RM app current state in method "isAppInFinalState".
{code:java}
public static boolean isAppInFinalState(RMApp rmApp) {
  RMAppState appState = ((RMAppImpl) rmApp).getRecoveredFinalState();
  if (appState == null) {
appState = rmApp.getState();
  }
  return appState == RMAppState.FAILED || appState == RMAppState.FINISHED
  || appState == RMAppState.KILLED;
}
{code}
For now, the *current state of the application is NEW because the APP_REJECTED 
event has not been processed yet* (the same issue described in YARN-7913). 
*This lead to the wrong decision to recover attempt*. We try to get a user of 
the application in FairScheduler#addApplicationAttempt and get NPE because the 
application nod found in the scheduler.
{code:java}
SchedulerApplication application = applications.get(
applicationAttemptId.getApplicationId());
String user = application.getUser(); //NPE
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:740)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1327)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:117)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1100)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1046)

 

*Ideally, we should process APP_REJECTED event before we try to recovery 
attempts.* But for now, I didn't find an easy way to do that.

*We can check whether an application is null.* If it true then skip this 
attempt. The same way as in CapacityScheduler and as was proposed in YARN-2025.
{code:java}
SchedulerApplication application = applications.get(
applicationAttemptId.getApplicationId());
if (application == null) {
  LOG.warn("Application " + applicationAttemptId.getApplicationId() +
  " cannot be found in scheduler.");
  return;
}
String user = application.getUser();
{code}
As a result, RM not failed now but we will get InvalidStateTransitonException 
because APP_REJECTED event will be processed too late.
{noformat}
2018-02-28 16:00:24,847 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: *Invalid event: 
APP_REJECTED at ACCEPTED.*
{noformat}
If we also add transition from ACCEPTED state to FAILED to the RMAppImpl 
StateMachineFactory
{code:java}
.addTransition(RMAppState.ACCEPTED, RMAppState.FINAL_SAVING,
RMAppEventType.APP_REJECTED,
new

[jira] [Comment Edited] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-03-05 Thread Oleksandr Shevchenko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385971#comment-16385971
 ] 

Oleksandr Shevchenko edited comment on YARN-7998 at 3/5/18 11:39 AM:
-

RM failed with NPE during failover if FairScheduler configurations were changed.

An application was not finished yet, so, application final state = null and 
also, the last app attempt doesn't have the final state too.

2018-02-28 15:50:51,576 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1517497680557_565955 *with 2 attempts and final state = null*
2018-02-28 15:50:54,761 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1517497680557_565955_01 with *final state: 
FAILED*
2018-02-28 15:50:54,766 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1517497680557_565955_02 with *final state: 
null*

In my case, an *ACL configuration in fair-scheduler.xml was changed* as a 
result we no longer have a rights to submit this application.

In FairScheduler#addApplication() we skip it application. We do not add this 
application to the scheduler application map and send event APP_REJECTED to go 
an application to the state FAILED.
{code:java}
if (!queue.hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi) && !queue
.hasAccess(QueueACL.ADMINISTER_QUEUE, userUgi)) {
  String msg = "User " + userUgi.getUserName()
  + " cannot submit applications to queue " + queue.getName()
  + "(requested queuename is " + queueName + ")";
  LOG.info(msg);
  rmContext.getDispatcher().getEventHandler().handle(
  new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED, msg));
  return;
} 
{code}
Then we try to recovery app attempts. When we try to recovery the last app 
attempt we should check the final state of attempt and the final state of the 
application (See RMAppAttemptImpl#transition()). As I said before, application 
final state = null and also, the last app attempt doesn't have the final state 
too. So, we check RM app current state in method "isAppInFinalState".
{code:java}
public static boolean isAppInFinalState(RMApp rmApp) {
  RMAppState appState = ((RMAppImpl) rmApp).getRecoveredFinalState();
  if (appState == null) {
appState = rmApp.getState();
  }
  return appState == RMAppState.FAILED || appState == RMAppState.FINISHED
  || appState == RMAppState.KILLED;
}
{code}
For now, the *current state of the application is NEW because the APP_REJECTED 
event has not been processed yet* as was described by Gergo Repas. *This lead 
to the wrong decision to recover attempt*. We try to get a user of the 
application in FairScheduler#addApplicationAttempt and get NPE because the 
application nod found in the scheduler.
{code:java}
SchedulerApplication application = applications.get(
applicationAttemptId.getApplicationId());
String user = application.getUser(); //NPE
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:740)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1327)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:117)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1100)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1046)

 

*Ideally, we should process APP_REJECTED event before we try to recovery 
attempts.* But for now, I didn't find an easy way to do that.

*We can check whether an application is null.* If it true then skip this 
attempt. The same way as in CapacityScheduler and as was proposed in YARN-2025.
{code:java}
SchedulerApplication application = applications.get(
applicationAttemptId.getApplicationId());
if (application == null) {
  LOG.warn("Application " + applicationAttemptId.getApplicationId() +
  " cannot be found in scheduler.");
  return;
}
String user = application.getUser();
{code}
As a result, RM not failed now but we will get InvalidStateTransitonException 
because APP_REJECTED event will be processed too late.
{noformat}
2018-02-28 16:00:24,847 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: *Invalid event: 
APP_REJECTED at ACCEPTED.*
{noformat}
If we also add transition from ACCEPTED state to FAILED to the RMAppImpl 
StateMachineFactory
{code:java}
.addTransition(RMAppState.ACCEPTED, RMAppState.FINAL_SAVING,
RMAppEventType.APP_REJECTED,
new

[jira] [Comment Edited] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-03-05 Thread Oleksandr Shevchenko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385971#comment-16385971
 ] 

Oleksandr Shevchenko edited comment on YARN-7998 at 3/5/18 11:35 AM:
-

RM failed with NPE during failover if FairScheduler configurations were changed.

An application was not finished yet, so, application final state = null and 
also, the last app attempt doesn't have the final state too.

2018-02-28 15:50:51,576 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1517497680557_565955 *with 2 attempts and final state = null*
2018-02-28 15:50:54,761 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1517497680557_565955_01 with *final state: 
FAILED*
2018-02-28 15:50:54,766 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1517497680557_565955_02 with *final state: 
null*

In my case, an *ACL configuration in fair-scheduler.xml was changed* as a 
result we no longer have a rights to submit this application.

In FairScheduler#addApplication() we skip it application. We do not add this 
application to the scheduler application map and send event APP_REJECTED to go 
an application to the state FAILED.
{code:java}
if (!queue.hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi) && !queue
.hasAccess(QueueACL.ADMINISTER_QUEUE, userUgi)) {
  String msg = "User " + userUgi.getUserName()
  + " cannot submit applications to queue " + queue.getName()
  + "(requested queuename is " + queueName + ")";
  LOG.info(msg);
  rmContext.getDispatcher().getEventHandler().handle(
  new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED, msg));
  return;
} 
{code}
Then we try to recovery app attempts. When we try to recovery the last app 
attempt we should check the final state of attempt and the final state of the 
application (See RMAppAttemptImpl#transition()). As I said before, application 
final state = null and also, the last app attempt doesn't have the final state 
too. So, we check RM app current state in method "isAppInFinalState".
{code:java}
public static boolean isAppInFinalState(RMApp rmApp) {
  RMAppState appState = ((RMAppImpl) rmApp).getRecoveredFinalState();
  if (appState == null) {
appState = rmApp.getState();
  }
  return appState == RMAppState.FAILED || appState == RMAppState.FINISHED
  || appState == RMAppState.KILLED;
}
{code}
For now, the *current state of the application is NEW because the APP_REJECTED 
event has not been processed yet* as was described by Gergo Repas. *This lead 
to the wrong decision to recover attempt*. We try to get a user of the 
application in FairScheduler#addApplicationAttempt and get NPE because the 
application nod found in the scheduler.
{code:java}
SchedulerApplication application = applications.get(
applicationAttemptId.getApplicationId());
String user = application.getUser(); //NPE
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:740)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1327)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:117)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1100)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1046)

 

*Ideally, we should process APP_REJECTED event before we try to recovery 
attempts.* But for now, I didn't find an easy way to do that.

*We can check whether an application is null.* If it true then skip this 
attempt. The same way as in CapacityScheduler and as was proposed in YARN-2025.

Perhaps, we should open a new ticket for this.
{code:java}
SchedulerApplication application = applications.get(
applicationAttemptId.getApplicationId());
if (application == null) {
  LOG.warn("Application " + applicationAttemptId.getApplicationId() +
  " cannot be found in scheduler.");
  return;
}
String user = application.getUser();
{code}
As a result, RM not failed now but we will get InvalidStateTransitonException 
because APP_REJECTED event will be processed too late.
{noformat}
2018-02-28 16:00:24,847 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: *Invalid event: 
APP_REJECTED at ACCEPTED.*
{noformat}
If we also add transition from ACCEPTED state to FAILED to the RMAppImpl 
StateMachineFactory
{code:java}
.addTransition(RMAppState.ACCEPTED, RMAppState.FINAL_SAVING,

[jira] [Comment Edited] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-03-05 Thread Oleksandr Shevchenko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385971#comment-16385971
 ] 

Oleksandr Shevchenko edited comment on YARN-7998 at 3/5/18 11:32 AM:
-

RM failed with NPE during failover if FairScheduler configurations were changed.

An application was not finished yet, so, application final state = null and 
also, the last app attempt doesn't have the final state too.

2018-02-28 15:50:51,576 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1517497680557_565955 *with 2 attempts and final state = null*
2018-02-28 15:50:54,761 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1517497680557_565955_01 with *final state: 
FAILED*
2018-02-28 15:50:54,766 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1517497680557_565955_02 with *final state: 
null*

In my case, an *ACL configuration in fair-scheduler.xml was changed* as a 
result we no longer have a rights to submit this application.

In FairScheduler#addApplication() we skip it application. We do not add this 
application to the scheduler application map and send event APP_REJECTED to go 
an application to the state FAILED.
{code:java}
if (!queue.hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi) && !queue
.hasAccess(QueueACL.ADMINISTER_QUEUE, userUgi)) {
  String msg = "User " + userUgi.getUserName()
  + " cannot submit applications to queue " + queue.getName()
  + "(requested queuename is " + queueName + ")";
  LOG.info(msg);
  rmContext.getDispatcher().getEventHandler().handle(
  new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED, msg));
  return;
} 
{code}
Then we try to recovery app attempts. When we try to recovery the last app 
attempt we should check the final state of attempt and the final state of the 
application (See RMAppAttemptImpl#transition()). As I said before, application 
final state = null and also, the last app attempt doesn't have the final state 
too. So, we check RM app current state in method "isAppInFinalState".
{code:java}
public static boolean isAppInFinalState(RMApp rmApp) {
  RMAppState appState = ((RMAppImpl) rmApp).getRecoveredFinalState();
  if (appState == null) {
appState = rmApp.getState();
  }
  return appState == RMAppState.FAILED || appState == RMAppState.FINISHED
  || appState == RMAppState.KILLED;
}
{code}
For now, the *current state of the application is NEW because the APP_REJECTED 
event has not been processed yet* as was described by Gergo Repas. *This lead 
to the wrong decision to recover attempt*. We try to get a user of the 
application in FairScheduler#addApplicationAttempt and get NPE because the 
application nod found in the scheduler.
{code:java}
SchedulerApplication application = applications.get(
applicationAttemptId.getApplicationId());
String user = application.getUser();
FSLeafQueue queue = (FSLeafQueue) application.getQueue(); //NPE
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:740)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1327)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:117)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1100)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1046)

 

*Ideally, we should process APP_REJECTED event before we try to recovery 
attempts.* But for now, I didn't find an easy way to do that.

*We can check whether an application is null.* If it true then skip this 
attempt. The same way as in CapacityScheduler and as was proposed in YARN-2025.

Perhaps, we should open a new ticket for this.
{code:java}
SchedulerApplication application = applications.get(
applicationAttemptId.getApplicationId());
if (application == null) {
  LOG.warn("Application " + applicationAttemptId.getApplicationId() +
  " cannot be found in scheduler.");
  return;
}
String user = application.getUser();
{code}
As a result, RM not failed now but we will get InvalidStateTransitonException 
because APP_REJECTED event will be processed too late.
{noformat}
2018-02-28 16:00:24,847 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: *Invalid event: 
APP_REJECTED at ACCEPTED.*
{noformat}
If we also add transition from ACCEPTED state to FAILED to the RMAppImpl 
StateMachineFactory
{code:java}

[jira] [Commented] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-03-05 Thread Oleksandr Shevchenko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385972#comment-16385972
 ] 

Oleksandr Shevchenko commented on YARN-7998:


[^YARN-7998.000.patch]

> RM crashes with NPE during recovering if ACL configuration was changed
> --
>
> Key: YARN-7998
> URL: https://issues.apache.org/jira/browse/YARN-7998
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Oleksandr Shevchenko
>Priority: Major
> Attachments: YARN-7998.000.patch
>
>
> RM crashes with NPE during failover because ACL configurations were changed 
> as a result we no longer have a rights to submit an application to a queue.
> Scenario:
>  # Submit an application
>  # Change ACL configuration for a queue that accepted the application so that 
> an owner of the application will no longer have a rights to submit this 
> application.
>  # Restart RM.
> As a result, we get NPE:
> 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-03-05 Thread Oleksandr Shevchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Shevchenko updated YARN-7998:
---
Attachment: YARN-7998.000.patch

> RM crashes with NPE during recovering if ACL configuration was changed
> --
>
> Key: YARN-7998
> URL: https://issues.apache.org/jira/browse/YARN-7998
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Oleksandr Shevchenko
>Priority: Major
> Attachments: YARN-7998.000.patch
>
>
> RM crashes with NPE during failover because ACL configurations were changed 
> as a result we no longer have a rights to submit an application to a queue.
> Scenario:
>  # Submit an application
>  # Change ACL configuration for a queue that accepted the application so that 
> an owner of the application will no longer have a rights to submit this 
> application.
>  # Restart RM.
> As a result, we get NPE:
> 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-03-05 Thread Oleksandr Shevchenko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385971#comment-16385971
 ] 

Oleksandr Shevchenko commented on YARN-7998:


RM failed with NPE during failover if FairScheduler configurations were changed.

An application was not finished yet, so, application final state = null and 
also, the last app attempt doesn't have the final state too.

2018-02-28 15:50:51,576 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1517497680557_565955 *with 2 attempts and final state = null*
2018-02-28 15:50:54,761 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1517497680557_565955_01 with *final state: 
FAILED*
2018-02-28 15:50:54,766 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1517497680557_565955_02 with *final state: 
null*

 

In my case, an *ACL configuration in fair-scheduler.xml was changed* as a 
result we no longer have a rights to submit this application.

In FairScheduler#addApplication() we skip it application. We do not add this 
application to the scheduler application map and send event APP_REJECTED to go 
an application to the state FAILED.
{code:java}
if (!queue.hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi) && !queue
.hasAccess(QueueACL.ADMINISTER_QUEUE, userUgi)) {
  String msg = "User " + userUgi.getUserName()
  + " cannot submit applications to queue " + queue.getName()
  + "(requested queuename is " + queueName + ")";
  LOG.info(msg);
  rmContext.getDispatcher().getEventHandler().handle(
  new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED, msg));
  return;
} 
{code}
Then we try to recovery app attempts. When we try to recovery the last app 
attempt we should check the final state of attempt and the final state of the 
application (See RMAppAttemptImpl#transition()). As I said before, application 
final state = null and also, the last app attempt doesn't have the final state 
too. So, we check RM app current state in method "isAppInFinalState".

 
{code:java}
public static boolean isAppInFinalState(RMApp rmApp) {
  RMAppState appState = ((RMAppImpl) rmApp).getRecoveredFinalState();
  if (appState == null) {
appState = rmApp.getState();
  }
  return appState == RMAppState.FAILED || appState == RMAppState.FINISHED
  || appState == RMAppState.KILLED;
}
{code}
 

For now, the *current state of the application is NEW because the APP_REJECTED 
event has not been processed yet* as was described by Gergo Repas. *This lead 
to the wrong decision to recover attempt*. We try to get a user of the 
application in FairScheduler#addApplicationAttempt and get NPE because the 
application nod found in the scheduler.
{code:java}
SchedulerApplication application = applications.get(
applicationAttemptId.getApplicationId());
String user = application.getUser();
FSLeafQueue queue = (FSLeafQueue) application.getQueue(); //NPE
{code}
 

java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:740)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1327)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:117)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1100)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1046)

 

*Ideally, we should process APP_REJECTED event before we try to recovery 
attempts.* But for now, I didn't find an easy way to do that.

*We can check whether an application is null.* If it true then skip this 
attempt. The same way as in CapacityScheduler and as was proposed in YARN-2025.

Perhaps, we should open a new ticket for this.
{code:java}
SchedulerApplication application = applications.get(
applicationAttemptId.getApplicationId());
if (application == null) {
  LOG.warn("Application " + applicationAttemptId.getApplicationId() +
  " cannot be found in scheduler.");
  return;
}
String user = application.getUser();
{code}
As a result, RM not failed now but we will get InvalidStateTransitonException 
because APP_REJECTED event will be processed too late.
{noformat}
2018-02-28 16:00:24,847 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: *Invalid event: 
APP_REJECTED at ACCEPTED.*
{noformat}
If we also add transition from ACCEPTED state to FAILED to the RMAppImpl 
StateMachineFactory
{code:java}
.addTransition(RMAppState.ACCEPTED,

[jira] [Created] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed

2018-03-05 Thread Oleksandr Shevchenko (JIRA)

Oleksandr Shevchenko created YARN-7998:
--

 Summary: RM crashes with NPE during recovering if ACL 
configuration was changed
 Key: YARN-7998
 URL: https://issues.apache.org/jira/browse/YARN-7998
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Oleksandr Shevchenko


RM crashes with NPE during failover because ACL configurations were changed as 
a result we no longer have a rights to submit an application to a queue.

Scenario:
 # Submit an application
 # Change ACL configuration for a queue that accepted the application so that 
an owner of the application will no longer have a rights to submit this 
application.
 # Restart RM.

As a result, we get NPE:
2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: Service 
ResourceManager failed in state STARTED; cause: java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4490) RM restart the finished app shows wrong Diagnostics status

2018-03-05 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385931#comment-16385931
 ] 

genericqa commented on YARN-4490:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 55s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 
57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}112m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-4490 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913001/YARN-4490_1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 28c392b43560 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e8c5be6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19886/testReport/ |
| Max. process+thread count | 821 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19886/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT

[jira] [Assigned] (YARN-4278) On AM registration, response should include cluster Nodes report on demanded by registration request.

2018-03-05 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-4278:
---

Assignee: (was: Rohith Sharma K S)

> On AM registration, response should  include cluster Nodes report on demanded 
> by registration request.
> --
>
> Key: YARN-4278
> URL: https://issues.apache.org/jira/browse/YARN-4278
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Reporter: Rohith Sharma K S
>Priority: Major
>
> From the yarn-dev mailing list discussion thread 
> [Thread-1|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c0ee80f6f7a98a64ebd18f2be839c91156798a...@szxeml512-mbs.china.huawei.com%3E]
>  
> [Thread-2|http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201510.mbox/%3c4f7812fc-ab5d-465d-ac89-824735698...@hortonworks.com%3E]
>  
> Slider required to know about cluster nodes details for providing support for 
> affinity/anti-affinity on containers.
> Current behavior : During life span of application , updatedNodes are sent in 
> allocate request only if there are any change like added/removed/'state 
> change' in the nodes. Otherwise cluster nodes not updated to AM.
> One of the approach thought by [~ste...@apache.org] is while AM registration 
> let response hold the cluster nodes report



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-6827) [ATS1/1.5] NPE exception while publishing recovering applications into ATS during RM restart.

2018-03-05 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-6827:
---

Assignee: (was: Rohith Sharma K S)

> [ATS1/1.5] NPE exception while publishing recovering applications into ATS 
> during RM restart.
> -
>
> Key: YARN-6827
> URL: https://issues.apache.org/jira/browse/YARN-6827
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Priority: Major
>
> While recovering application, it is observed that NPE exception is thrown as 
> below.
> {noformat}
> 017-07-13 14:08:12,476 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher:
>  Error when publishing entity 
> [YARN_APPLICATION,application_1499929227397_0001]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368)
> {noformat}
> This is because in RM service start, active services are started first in Non 
> HA case and later ATSv1 services are started. In HA case, tansitionToActive 
> event has come first before ATS service are started.
> This gives sufficient time to active services recover the applications which 
> tries to publish into ATSv1 while recovering. Since ATS services are not 
> started yet, it throws NPE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6702) Zk connection leak during activeService fail if embedded elector is not curator

2018-03-05 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385922#comment-16385922
 ] 

genericqa commented on YARN-6702:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-6702 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6702 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12872760/YARN-6702.01.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19887/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Zk connection leak during activeService fail if embedded elector is not 
> curator
> ---
>
> Key: YARN-6702
> URL: https://issues.apache.org/jira/browse/YARN-6702
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Bibin A Chundatt
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6702.01.patch
>
>
> {{ResourceManager#transitionToActive}} startActiveService Failure the active 
> services are reinitialized.
> {code}
> this.rmLoginUGI.doAs(new PrivilegedExceptionAction() {
>   @Override
>   public Void run() throws Exception {
> try {
>   startActiveServices();
>   return null;
> } catch (Exception e) {
>   reinitialize(true);
>   throw e;
> }
>   }
> });
> {code}
> {{ZKRMStateStore#initInternal}} will create another ZK connection.
> {code}
> curatorFramework = resourceManager.getCurator();
> if (curatorFramework == null) {
>   curatorFramework = resourceManager.createAndStartCurator(conf);
> }
> {code}
> {quote}
> secureuser@vm1:/opt/hadoop/release/hadoop/sbin> netstat -aen | grep 2181
> tcp0  0 192.168.56.101:49222192.168.56.103:2181 
> ESTABLISHED 1004   31984  
> tcp0  0 192.168.56.101:46016192.168.56.103:2181 
> ESTABLISHED 1004   26120  
> tcp0  0 192.168.56.101:50918192.168.56.103:2181 
> ESTABLISHED 1004   34761  
> tcp0  0 192.168.56.101:49598192.168.56.103:2181 
> ESTABLISHED 1004   32483  
> tcp0  0 192.168.56.101:49472192.168.56.103:2181 
> ESTABLISHED 1004   32364  
> tcp0  0 192.168.56.101:50708192.168.56.103:2181 
> ESTABLISHED 1004   34310  
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-5512) Finished containers for running application should be displayed on container table

2018-03-05 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-5512.
-
Resolution: Implemented

I am not sure which is the sub task under YARN-3368 fixes this issue. But I see 
completed container details in UI2. So I am closing as implemented.

> Finished containers for running application should be displayed on container 
> table
> --
>
> Key: YARN-5512
> URL: https://issues.apache.org/jira/browse/YARN-5512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
>
> In offline discussion with [~vinodkv], one of the point on yarn-web-ui 
> improvement is, 
> Currently yarn-web-ui attempt page displays running container details. But 
> these container disappear once it got finished. Earlier there was no 
> mechanism to track finished container details. Now, once ATSv2 is ready, 
> finished containers are being published from NodeManager and can be read. It 
> would be good if finished containers details also displayed. 
> In new RM web ui , better if we can consider this also.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6702) Zk connection leak during activeService fail if embedded elector is not curator

2018-03-05 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385916#comment-16385916
 ] 

Rohith Sharma K S commented on YARN-6702:
-

[~bibinchundatt] there are lot of code refactoring happened in ZKRMStateStore. 
Do you see is this issue is still a valid ? 

> Zk connection leak during activeService fail if embedded elector is not 
> curator
> ---
>
> Key: YARN-6702
> URL: https://issues.apache.org/jira/browse/YARN-6702
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Bibin A Chundatt
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6702.01.patch
>
>
> {{ResourceManager#transitionToActive}} startActiveService Failure the active 
> services are reinitialized.
> {code}
> this.rmLoginUGI.doAs(new PrivilegedExceptionAction() {
>   @Override
>   public Void run() throws Exception {
> try {
>   startActiveServices();
>   return null;
> } catch (Exception e) {
>   reinitialize(true);
>   throw e;
> }
>   }
> });
> {code}
> {{ZKRMStateStore#initInternal}} will create another ZK connection.
> {code}
> curatorFramework = resourceManager.getCurator();
> if (curatorFramework == null) {
>   curatorFramework = resourceManager.createAndStartCurator(conf);
> }
> {code}
> {quote}
> secureuser@vm1:/opt/hadoop/release/hadoop/sbin> netstat -aen | grep 2181
> tcp0  0 192.168.56.101:49222192.168.56.103:2181 
> ESTABLISHED 1004   31984  
> tcp0  0 192.168.56.101:46016192.168.56.103:2181 
> ESTABLISHED 1004   26120  
> tcp0  0 192.168.56.101:50918192.168.56.103:2181 
> ESTABLISHED 1004   34761  
> tcp0  0 192.168.56.101:49598192.168.56.103:2181 
> ESTABLISHED 1004   32483  
> tcp0  0 192.168.56.101:49472192.168.56.103:2181 
> ESTABLISHED 1004   32364  
> tcp0  0 192.168.56.101:50708192.168.56.103:2181 
> ESTABLISHED 1004   34310  
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7997) Add RM HA state in jmx

2018-03-05 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385914#comment-16385914
 ] 

Rohith Sharma K S commented on YARN-7997:
-

This appears to be duplicate of YARN-2442. 

> Add RM HA state in jmx 
> ---
>
> Key: YARN-7997
> URL: https://issues.apache.org/jira/browse/YARN-7997
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Priority: Minor
>
> Currently in RM .jmx interface there is not option to know HA state for each 
> RM.
> Need an interface similar to Namenode {{FSNamesystem}} provision to know each 
> RM state



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-3163) admin support for YarnAuthorizationProvider

2018-03-05 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-3163.
-
Resolution: Won't Fix

> admin support for YarnAuthorizationProvider
> ---
>
> Key: YARN-3163
> URL: https://issues.apache.org/jira/browse/YARN-3163
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
>
> Runtime configuration support for YarnAuthorizationProvider. Using admin 
> commands, one should be able to set and get permission from the 
> YarnAuthorizationProvider. This mechanism will help users without updating 
> config files and firing reload commands.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-3162) persistence support for YarnAuthorizationProvider

2018-03-05 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-3162.
-
Resolution: Won't Fix

> persistence support for YarnAuthorizationProvider
> -
>
> Key: YARN-3162
> URL: https://issues.apache.org/jira/browse/YARN-3162
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
>Priority: Major
>
> As discussed in YARN-3100, admin support can be a good addition for 
> YarnAuthorizationProvider. Hence sync up between memory store and config file 
> will be of higher importantance. This JIRA will focus on a persistence 
> storage for ACLs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-7701) Both RM are in standby in secure cluster

2018-03-05 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-7701.
-
Resolution: Cannot Reproduce

I tried in trunk to reproduce same, but could not get this. The reason is  
YARN-6061 and YARN-3742 are fixed in trunk which triggers event to transition 
to standby.
I am closing as can't reproduce in trunk!

> Both RM are in standby in secure cluster
> 
>
> Key: YARN-7701
> URL: https://issues.apache.org/jira/browse/YARN-7701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-7701.01.patch
>
>
> Both RM were running perfectly fine for many days and switched multiple 
> times. At some point of time when RM is switched from ACTIVE -> STANDBY, UGI 
> information got either changed or to subject new user got added.  
> As a result UGI#getShortUserName() is returning wrong user which result in 
> fail to  transition to ACTIVE with AccessControlException!
> {code}Caused by: org.apache.hadoop.security.AccessControlException: User 
> odsuser doesn't have permission to call 'refreshAdminAcls' 
> {code}
> _odsuser_ user is application submitted user. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-3418) AM to be able to set/update web URL and IPC ports post-registration

2018-03-05 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-3418:
---

Assignee: (was: Rohith Sharma K S)

> AM to be able to set/update web URL and IPC ports post-registration
> ---
>
> Key: YARN-3418
> URL: https://issues.apache.org/jira/browse/YARN-3418
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Major
>
> Currently the AM can only set the IPC and HTTP(s) ports on AM registration.
> This
> # creates a possible race condition: the IPC and HTTP ports need to come up 
> before the app is fully initialised. This is particularly true on 
> work-preserving AM restarts, as the AM will depend on the list of containers 
> supplied during registration to build its internal state. 
> # prevents the AM from changing these values dynamically during application 
> execution. This matters if the Web or IPC services are hosted not in the AM, 
> but in a deployed container. If the container is restarted, there's no way to 
> rebind the services. 
> A new AM-RM IPC call to publish updated binding information is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7988) Refactor FSNodeLabelStore code for attributes store support

2018-03-05 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385893#comment-16385893
 ] 

genericqa commented on YARN-7988:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-3409 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
41s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
49s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
48s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green} YARN-3409 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
40s{color} | {color:green} hadoop-yarn-project_hadoop-yarn generated 0 new + 86 
unchanged - 1 fixed = 86 total (was 87) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  3s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 13 new + 52 unchanged - 20 fixed = 65 total (was 72) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
4s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 17s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
51s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}155m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7988 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12912981/YARN-7988-YARN-3409.004.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4e6a2c38a511 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017

[jira] [Commented] (YARN-5028) RMStateStore should trim down app state for completed applications

2018-03-05 Thread Gergo Repas (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385855#comment-16385855
 ] 

Gergo Repas commented on YARN-5028:
---

Thanks [~yufeigu], [~rohithsharma]!

> RMStateStore should trim down app state for completed applications
> --
>
> Key: YARN-5028
> URL: https://issues.apache.org/jira/browse/YARN-5028
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: YARN-5028.000.patch, YARN-5028.001.patch, 
> YARN-5028.002.patch, YARN-5028.003.patch, YARN-5028.004.patch, 
> YARN-5028.005.patch, YARN-5028.006.patch, YARN-5028.007-addendum.patch, 
> YARN-5028.007-addendum.patch, YARN-5028.007.patch
>
>
> RMStateStore stores enough information to recover applications in case of a 
> restart. The store also retains this information for completed applications 
> to serve their status to REST, WebUI, Java and CLI clients. We don't need all 
> the information we store today to serve application status; for instance, we 
> don't need the {{ApplicationSubmissionContext}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7997) Add RM HA state in jmx

2018-03-05 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385832#comment-16385832
 ] 

Bibin A Chundatt commented on YARN-7997:


[~rohithsharma] thoughts on adding the same in {{RMNMInfo}}?

> Add RM HA state in jmx 
> ---
>
> Key: YARN-7997
> URL: https://issues.apache.org/jira/browse/YARN-7997
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Priority: Minor
>
> Currently in RM .jmx interface there is not option to know HA state for each 
> RM.
> Need an interface similar to Namenode {{FSNamesystem}} provision to know each 
> RM state



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7997) Add RM HA state in jmx

2018-03-05 Thread Bibin A Chundatt (JIRA)

Bibin A Chundatt created YARN-7997:
--

 Summary: Add RM HA state in jmx 
 Key: YARN-7997
 URL: https://issues.apache.org/jira/browse/YARN-7997
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bibin A Chundatt


Currently in RM .jmx interface there is not option to know HA state for each RM.
Need an interface similar to Namenode {{FSNamesystem}} provision to know each 
RM state





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4490) RM restart the finished app shows wrong Diagnostics status

2018-03-05 Thread Shen Yinjie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie updated YARN-4490:
--
Attachment: YARN-4490_1.patch

> RM restart the finished app shows wrong Diagnostics status
> --
>
> Key: YARN-4490
> URL: https://issues.apache.org/jira/browse/YARN-4490
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Reporter: Mohammad Shahid Khan
>Assignee: Shen Yinjie
>Priority: Major
> Attachments: YARN-4490_1.patch
>
>
> RM restart the finished app shows wrong Diagnostics status.
> Preconditions:
> RM recovery enable true.
> Steps:
> 1. run an application, wait application is finished.
> 2. Restart the RM
> 3. Check the application status is RM web UI
> Issue:
> Check the Diagnostic message: Attempt recovered after RM restart.
> Expected:
> The Diagnostic message should be available only for the application waiting 
> for allocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4490) RM restart the finished app shows wrong Diagnostics status

2018-03-05 Thread Shen Yinjie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie updated YARN-4490:
--
Attachment: (was: YARN-4490_1.patch)

> RM restart the finished app shows wrong Diagnostics status
> --
>
> Key: YARN-4490
> URL: https://issues.apache.org/jira/browse/YARN-4490
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Reporter: Mohammad Shahid Khan
>Assignee: Shen Yinjie
>Priority: Major
> Attachments: YARN-4490_1.patch
>
>
> RM restart the finished app shows wrong Diagnostics status.
> Preconditions:
> RM recovery enable true.
> Steps:
> 1. run an application, wait application is finished.
> 2. Restart the RM
> 3. Check the application status is RM web UI
> Issue:
> Check the Diagnostic message: Attempt recovered after RM restart.
> Expected:
> The Diagnostic message should be available only for the application waiting 
> for allocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-6462) Add yarn command to list all queues

2018-03-05 Thread Shen Yinjie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376802#comment-16376802
 ] 

Shen Yinjie edited comment on YARN-6462 at 3/5/18 8:45 AM:
---

 Update Description. we should have a list-all-leaf-queues interface to get all 
leaf queues for apps running.


was (Author: shenyinjie):
 update Description.

> Add yarn command to list all queues
> ---
>
> Key: YARN-6462
> URL: https://issues.apache.org/jira/browse/YARN-6462
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Major
> Attachments: YARN-6462_2.patch
>
>
> we need a yarn command to list all leaf queues.
> especially in large scale cluster ,there are a large amount of  queues in 
> tree-format for various apps , we actually need a list-all-leaf-queues 
> interface to get queues infomation immediately ,other than search in fair 
> scheduler.xml or in yarn-scheduler web layer by layer.sometimes we should 
> also vertify a new queue is successfully added in scheduer, instead of fail 
> for some format error or other wise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-6462) Add yarn command to list all queues

2018-03-05 Thread Shen Yinjie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376802#comment-16376802
 ] 

Shen Yinjie edited comment on YARN-6462 at 3/5/18 8:43 AM:
---

 update Description.


was (Author: shenyinjie):
 update Desription.

> Add yarn command to list all queues
> ---
>
> Key: YARN-6462
> URL: https://issues.apache.org/jira/browse/YARN-6462
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Major
> Attachments: YARN-6462_2.patch
>
>
> we need a yarn command to list all leaf queues.
> especially in large scale cluster ,there are a large amount of  queues in 
> tree-format for various apps , we actually need a list-all-leaf-queues 
> interface to get queues infomation immediately ,other than search in fair 
> scheduler.xml or in yarn-scheduler web layer by layer.sometimes we should 
> also vertify a new queue is successfully added in scheduer, instead of fail 
> for some format error or other wise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7975) Add an optional arg to yarn cluster -list-node-labels to list nodes collection partitioned by labels

2018-03-05 Thread Shen Yinjie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385792#comment-16385792
 ] 

Shen Yinjie commented on YARN-7975:
---

[~leftnoteasy]，[~sunilg]，would you mind taking a look at it?

> Add an optional arg to yarn cluster -list-node-labels to list nodes 
> collection partitioned by labels
> 
>
> Key: YARN-7975
> URL: https://issues.apache.org/jira/browse/YARN-7975
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Major
> Attachments: YARN-7975.patch
>
>
> Since we have "yarn cluster -lnl" to print all nodelabels info .But it's not 
> enough,we should be abale to list nodes collection partitioned by 
> labels,especially in large cluster.
> So  I propose to add an optional argument  "-nodes" for  "yarn cluster -lnl" 
> to achieve this.
> e.g.
> [yarn@docker1 ~]$ yarn cluster -lnl -nodes
> Node Labels Num: 3
>               Labels                                               Nodes
>

[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-03-05 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: hadoop-2.7.2-gpu-port.patch

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2-gpu-port.patch, hadoop-2.7.2-gpu.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

1 2 >

1 - 100 of 101 matches

Mail list logo