[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673870#comment-16673870
 ] 

Hadoop QA commented on YARN-8672:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m  
1s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8672 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946760/YARN-8672.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 38bd19105dce 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 989715e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22411/testReport/ |
| Max. process+thread count | 306 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/22411/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> 

[jira] [Commented] (YARN-8867) Retrieve the status of resource localization

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673853#comment-16673853
 ] 

Hadoop QA commented on YARN-8867:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 15m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
28s{color} | {color:green} root: The patch generated 0 new + 480 unchanged - 1 
fixed = 480 total (was 481) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  9m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
50s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
31s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
30s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
50s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 39s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 37m 24s{color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
13s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 2s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| 

[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

2018-11-02 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673832#comment-16673832
 ] 

Subru Krishnan commented on YARN-7592:
--

Thanks [~rahulanand90] for the clarification. Can you update the patch after 
removing the flag (which I should mention is great) and quickly revalidate that 
there's no regression?

+1 from my side pending that.

 

> yarn.federation.failover.enabled missing in yarn-default.xml
> 
>
> Key: YARN-7592
> URL: https://issues.apache.org/jira/browse/YARN-7592
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0-beta1
>Reporter: Gera Shegalov
>Priority: Major
> Attachments: IssueReproduce.patch
>
>
> yarn.federation.failover.enabled should be documented in yarn-default.xml. I 
> am also not sure why it should be true by default and force the HA retry 
> policy in {{RMProxy#createRMProxy}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6900) ZooKeeper based implementation of the FederationStateStore

2018-11-02 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673830#comment-16673830
 ] 

Subru Krishnan commented on YARN-6900:
--

[~rahulanand90], I agree with you the parameters are tricky to identify. 
Programmatically, what we need is a serialize conf as defined 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/policies/manager/FederationPolicyManager.java#L101].

Manually, we could start with a key-value map where predefined keys could be 
router/amrmproxy weights or headroomAlpha. Thoughts?

 

> ZooKeeper based implementation of the FederationStateStore
> --
>
> Key: YARN-6900
> URL: https://issues.apache.org/jira/browse/YARN-6900
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6900-002.patch, YARN-6900-003.patch, 
> YARN-6900-004.patch, YARN-6900-005.patch, YARN-6900-006.patch, 
> YARN-6900-007.patch, YARN-6900-008.patch, YARN-6900-009.patch, 
> YARN-6900-010.patch, YARN-6900-011.patch, YARN-6900-YARN-2915-000.patch, 
> YARN-6900-YARN-2915-001.patch
>
>
> YARN-5408 defines the unified {{FederationStateStore}} API. Currently we only 
> support SQL based stores, this JIRA tracks adding a ZooKeeper based 
> implementation for simplifying deployment as it's already popularly used for 
> {{RMStateStore}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out

2018-11-02 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673827#comment-16673827
 ] 

Chandni Singh commented on YARN-8672:
-

Uploaded patch 5 with localizers having private token files.

 

> TestContainerManager#testLocalingResourceWhileContainerRunning occasionally 
> times out
> -
>
> Key: YARN-8672
> URL: https://issues.apache.org/jira/browse/YARN-8672
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Jason Lowe
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8672.001.patch, YARN-8672.002.patch, 
> YARN-8672.003.patch, YARN-8672.004.patch, YARN-8672.005.patch
>
>
> Precommit builds have been failing in 
> TestContainerManager#testLocalingResourceWhileContainerRunning.  I have been 
> able to reproduce the problem without any patch applied if I run the test 
> enough times.  It looks like something is removing container tokens from the 
> nmPrivate area just as a new localizer starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out

2018-11-02 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8672:

Attachment: YARN-8672.005.patch

> TestContainerManager#testLocalingResourceWhileContainerRunning occasionally 
> times out
> -
>
> Key: YARN-8672
> URL: https://issues.apache.org/jira/browse/YARN-8672
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Jason Lowe
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8672.001.patch, YARN-8672.002.patch, 
> YARN-8672.003.patch, YARN-8672.004.patch, YARN-8672.005.patch
>
>
> Precommit builds have been failing in 
> TestContainerManager#testLocalingResourceWhileContainerRunning.  I have been 
> able to reproduce the problem without any patch applied if I run the test 
> enough times.  It looks like something is removing container tokens from the 
> nmPrivate area just as a new localizer starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8963) Add flag to disable interactive shell

2018-11-02 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8963:

Attachment: YARN-8963.001.patch

> Add flag to disable interactive shell
> -
>
> Key: YARN-8963
> URL: https://issues.apache.org/jira/browse/YARN-8963
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
> Attachments: YARN-8963.001.patch
>
>
> For some production job, application admin might choose to disable debugging 
> to production jobs to prevent developer or system admin from accessing the 
> containers.  It would be nice to add an environment variable flag to disable 
> interactive shell during application submission.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8893) [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client

2018-11-02 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8893:
---
Fix Version/s: 2.10.0

> [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client
> ---
>
> Key: YARN-8893
> URL: https://issues.apache.org/jira/browse/YARN-8893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, federation
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Fix For: 2.10.0, 3.3.0
>
> Attachments: YARN-8893.v1.patch, YARN-8893.v2.patch
>
>
> Fix thread leak in AMRMClientRelayer and UAM client used by 
> FederationInterceptor, when destroying the interceptor pipeline in AMRMProxy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8893) [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client

2018-11-02 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673807#comment-16673807
 ] 

Botong Huang commented on YARN-8893:


Thanks [~giovanni.fumarola] for the review! Committed to branch-2 as well. 

> [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client
> ---
>
> Key: YARN-8893
> URL: https://issues.apache.org/jira/browse/YARN-8893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, federation
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8893.v1.patch, YARN-8893.v2.patch
>
>
> Fix thread leak in AMRMClientRelayer and UAM client used by 
> FederationInterceptor, when destroying the interceptor pipeline in AMRMProxy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8893) [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673782#comment-16673782
 ] 

Hudson commented on YARN-8893:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15351 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15351/])
YARN-8893. [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM (gifuma: 
rev 989715ec5066c6ac7868e25ad9234dc64723e61e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/MockResourceManagerFacade.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedApplicationManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/uam/TestUnmanagedApplicationManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/AMRMClientRelayer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/metrics/TestAMRMClientRelayerMetrics.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedAMPoolManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/TestAMRMClientRelayer.java


> [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client
> ---
>
> Key: YARN-8893
> URL: https://issues.apache.org/jira/browse/YARN-8893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, federation
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8893.v1.patch, YARN-8893.v2.patch
>
>
> Fix thread leak in AMRMClientRelayer and UAM client used by 
> FederationInterceptor, when destroying the interceptor pipeline in AMRMProxy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8893) [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client

2018-11-02 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673760#comment-16673760
 ] 

Giovanni Matteo Fumarola commented on YARN-8893:


Perfect. Committed to trunk.

Thanks [~botong] for the patch.

> [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client
> ---
>
> Key: YARN-8893
> URL: https://issues.apache.org/jira/browse/YARN-8893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, federation
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8893.v1.patch, YARN-8893.v2.patch
>
>
> Fix thread leak in AMRMClientRelayer and UAM client used by 
> FederationInterceptor, when destroying the interceptor pipeline in AMRMProxy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8893) [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client

2018-11-02 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8893:
---
Fix Version/s: 3.3.0

> [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client
> ---
>
> Key: YARN-8893
> URL: https://issues.apache.org/jira/browse/YARN-8893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, federation
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8893.v1.patch, YARN-8893.v2.patch
>
>
> Fix thread leak in AMRMClientRelayer and UAM client used by 
> FederationInterceptor, when destroying the interceptor pipeline in AMRMProxy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8962) Add ability to use interactive shell with normal yarn container

2018-11-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673751#comment-16673751
 ] 

Eric Yang commented on YARN-8962:
-

Patch 002 added test case.

> Add ability to use interactive shell with normal yarn container
> ---
>
> Key: YARN-8962
> URL: https://issues.apache.org/jira/browse/YARN-8962
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
> Attachments: YARN-8962.001.patch, YARN-8962.002.patch
>
>
> This task is focusing on extending interactive shell capability to yarn 
> container without docker.  This will improve some aspect of debugging 
> mapreduce or spark applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8962) Add ability to use interactive shell with normal yarn container

2018-11-02 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8962:

Attachment: YARN-8962.002.patch

> Add ability to use interactive shell with normal yarn container
> ---
>
> Key: YARN-8962
> URL: https://issues.apache.org/jira/browse/YARN-8962
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
> Attachments: YARN-8962.001.patch, YARN-8962.002.patch
>
>
> This task is focusing on extending interactive shell capability to yarn 
> container without docker.  This will improve some aspect of debugging 
> mapreduce or spark applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8893) [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client

2018-11-02 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673748#comment-16673748
 ] 

Botong Huang commented on YARN-8893:


This is also a proper shutdown and close connection in the forceKill case, so 
that after we forceKilled the UAM, our local proxy connection inside the 
AMRMClientRelayer won't be left open. 

> [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client
> ---
>
> Key: YARN-8893
> URL: https://issues.apache.org/jira/browse/YARN-8893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, federation
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8893.v1.patch, YARN-8893.v2.patch
>
>
> Fix thread leak in AMRMClientRelayer and UAM client used by 
> FederationInterceptor, when destroying the interceptor pipeline in AMRMProxy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8893) [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client

2018-11-02 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673721#comment-16673721
 ] 

Giovanni Matteo Fumarola commented on YARN-8893:


Thanks [~botong].

Why do you need to call this.rmProxyRelayer.shutdown(); from 
{{forceKillApplication}}?

> [AMRMProxy] Fix thread leak in AMRMClientRelayer and UAM client
> ---
>
> Key: YARN-8893
> URL: https://issues.apache.org/jira/browse/YARN-8893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, federation
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8893.v1.patch, YARN-8893.v2.patch
>
>
> Fix thread leak in AMRMClientRelayer and UAM client used by 
> FederationInterceptor, when destroying the interceptor pipeline in AMRMProxy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8932) ResourceUtilization cpu is misused in oversubscription as a percentage

2018-11-02 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673638#comment-16673638
 ] 

Haibo Chen commented on YARN-8932:
--

The shaded client issue is unrelated to this patch. It failed before the patch 
was applied. 

Checking in this patch to YARN-1011 branch. Will open a Jira if the shaded 
client issue comes up again.  Thanks [~rkanter] for the review!

> ResourceUtilization cpu is misused in oversubscription as a percentage
> --
>
> Key: YARN-8932
> URL: https://issues.apache.org/jira/browse/YARN-8932
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8932-YARN-1011.00.patch, 
> YARN-8932-YARN-1011.01.patch, YARN-8932-YARN-1011.02.patch
>
>
> The ResourceUtilization javadoc mistakenly documents the cpu as a percentage 
> represented by a float number in [0, 1.0f], however it is used as the # of 
> vcores used in reality.
> See javadoc and discussion in YARN-8911.
>   /**
>    * Get CPU utilization.
>    *
>    * @return CPU utilization normalized to 1 CPU
>    */
>   @Public
>   @Unstable
>   public abstract float getCPU();



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8776) Container Executor change to create stdin/stdout pipeline

2018-11-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673632#comment-16673632
 ] 

Eric Yang commented on YARN-8776:
-

The failed test case is tracked by YARN-8672.  [~billie.rinaldi] can you help 
with the review?

> Container Executor change to create stdin/stdout pipeline
> -
>
> Key: YARN-8776
> URL: https://issues.apache.org/jira/browse/YARN-8776
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8776.001.patch, YARN-8776.002.patch, 
> YARN-8776.003.patch, YARN-8776.004.patch, YARN-8776.005.patch, 
> YARN-8776.006.patch
>
>
> The pipeline is built to connect the stdin/stdout channel from WebSocket 
> servlet through container-executor to docker executor. So when the WebSocket 
> servlet is started, we need to invoke container-executor “dockerExec” method 
> (which will be implemented) to create a new docker executor and use “docker 
> exec -it $ContainerId” command which executes an interactive bash shell on 
> the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8962) Add ability to use interactive shell with normal yarn container

2018-11-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673622#comment-16673622
 ] 

Eric Yang commented on YARN-8962:
-

First patch renames run_docker_with_pty to exec_container to support both 
docker exec -it and yarn container.  Yarn container is using restricted shell 
to prevent user from going out of bound.  DefaultLinuxContainerRuntime 
implements writing .cmd file for c version of container-executor to run shell 
for yarn container.


> Add ability to use interactive shell with normal yarn container
> ---
>
> Key: YARN-8962
> URL: https://issues.apache.org/jira/browse/YARN-8962
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
> Attachments: YARN-8962.001.patch
>
>
> This task is focusing on extending interactive shell capability to yarn 
> container without docker.  This will improve some aspect of debugging 
> mapreduce or spark applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8962) Add ability to use interactive shell with normal yarn container

2018-11-02 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8962:

Attachment: YARN-8962.001.patch

> Add ability to use interactive shell with normal yarn container
> ---
>
> Key: YARN-8962
> URL: https://issues.apache.org/jira/browse/YARN-8962
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
> Attachments: YARN-8962.001.patch
>
>
> This task is focusing on extending interactive shell capability to yarn 
> container without docker.  This will improve some aspect of debugging 
> mapreduce or spark applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7225) Add queue and partition info to RM audit log

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673617#comment-16673617
 ] 

Hadoop QA commented on YARN-7225:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.8 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
49s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} branch-2.8 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 31s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
18s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}126m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-yarn-server-resourcemanager:1 |
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestRMHA |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMReconnect |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
 |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
| Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ae3769f |
| JIRA Issue | YARN-7225 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-7225) Add queue and partition info to RM audit log

2018-11-02 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673584#comment-16673584
 ] 

Jonathan Hung commented on YARN-7225:
-

Thanks, committed to branch-2.8.

> Add queue and partition info to RM audit log
> 
>
> Key: YARN-7225
> URL: https://issues.apache.org/jira/browse/YARN-7225
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.9.1, 2.8.4, 3.0.2, 3.1.1
>Reporter: Jonathan Hung
>Assignee: Eric Payne
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 2.8.6, 3.2.1, 2.9.3
>
> Attachments: YARN-7225.001.patch, YARN-7225.002.patch, 
> YARN-7225.003.patch, YARN-7225.004.patch, YARN-7225.005.patch, 
> YARN-7225.branch-2.8.001.patch, YARN-7225.branch-2.8.002.patch, 
> YARN-7225.branch-2.8.003.addendum.patch
>
>
> Right now RM audit log has fields such as user, ip, resource, etc. Having 
> queue and partition  is useful for resource tracking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7225) Add queue and partition info to RM audit log

2018-11-02 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673579#comment-16673579
 ] 

Eric Payne commented on YARN-7225:
--

Yes, [~jhung]. Very good catch. Sorry about that.
+1 to the update.

> Add queue and partition info to RM audit log
> 
>
> Key: YARN-7225
> URL: https://issues.apache.org/jira/browse/YARN-7225
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.9.1, 2.8.4, 3.0.2, 3.1.1
>Reporter: Jonathan Hung
>Assignee: Eric Payne
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 2.8.6, 3.2.1, 2.9.3
>
> Attachments: YARN-7225.001.patch, YARN-7225.002.patch, 
> YARN-7225.003.patch, YARN-7225.004.patch, YARN-7225.005.patch, 
> YARN-7225.branch-2.8.001.patch, YARN-7225.branch-2.8.002.patch, 
> YARN-7225.branch-2.8.003.addendum.patch
>
>
> Right now RM audit log has fields such as user, ip, resource, etc. Having 
> queue and partition  is useful for resource tracking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8867) Retrieve the status of resource localization

2018-11-02 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8867:

Attachment: YARN-8867.002.patch

> Retrieve the status of resource localization
> 
>
> Key: YARN-8867
> URL: https://issues.apache.org/jira/browse/YARN-8867
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8867.001.patch, YARN-8867.002.patch, 
> YARN-8867.wip.patch
>
>
> Refer YARN-3854.
> Currently NM does not have an API to retrieve the status of localization. 
> Unless the client can know when the localization of a resource is complete 
> irrespective of the type of the resource, it cannot take any appropriate 
> action. 
> We need an API in {{ContainerManagementProtocol}} to retrieve the status on 
> the localization. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8897) LoadBasedRouterPolicy throws "NPE" in case of sub cluster unavailability

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673533#comment-16673533
 ] 

Hudson commented on YARN-8897:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15350 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15350/])
YARN-8897. LoadBasedRouterPolicy throws NPE in case of sub cluster (gifuma: rev 
aed836efbff775d95899d05ff947f1048df8cf19)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/policies/router/LoadBasedRouterPolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/federation/policies/router/TestLoadBasedRouterPolicy.java


> LoadBasedRouterPolicy throws "NPE" in case of sub cluster unavailability 
> -
>
> Key: YARN-8897
> URL: https://issues.apache.org/jira/browse/YARN-8897
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Reporter: Akshay Agarwal
>Assignee: Bilwa S T
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-8897-001.patch, YARN-8897-002.patch, 
> YARN-8897-003.patch
>
>
> If no sub clusters are available for "*Load Based Router Policy*" with 
> *cluster weight* as *1*  in Router Based Federation Setup , throwing 
> "*NullPointerException*".
>  
> *Exception Details:*
> {code:java}
> java.lang.NullPointerException: java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.federation.policies.router.LoadBasedRouterPolicy.getHomeSubcluster(LoadBasedRouterPolicy.java:99)
>  at 
> org.apache.hadoop.yarn.server.federation.policies.RouterPolicyFacade.getHomeSubcluster(RouterPolicyFacade.java:204)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.submitApplication(FederationClientInterceptor.java:362)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.submitApplication(RouterClientRMService.java:218)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:282)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:579)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:297)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>  at com.sun.proxy.$Proxy15.submitApplication(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:288)
>  at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:300)
>  

[jira] [Updated] (YARN-8897) LoadBasedRouterPolicy throws "NPE" in case of sub cluster unavailability

2018-11-02 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8897:
---
Fix Version/s: 3.3.0

> LoadBasedRouterPolicy throws "NPE" in case of sub cluster unavailability 
> -
>
> Key: YARN-8897
> URL: https://issues.apache.org/jira/browse/YARN-8897
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Reporter: Akshay Agarwal
>Assignee: Bilwa S T
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-8897-001.patch, YARN-8897-002.patch, 
> YARN-8897-003.patch
>
>
> If no sub clusters are available for "*Load Based Router Policy*" with 
> *cluster weight* as *1*  in Router Based Federation Setup , throwing 
> "*NullPointerException*".
>  
> *Exception Details:*
> {code:java}
> java.lang.NullPointerException: java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.federation.policies.router.LoadBasedRouterPolicy.getHomeSubcluster(LoadBasedRouterPolicy.java:99)
>  at 
> org.apache.hadoop.yarn.server.federation.policies.RouterPolicyFacade.getHomeSubcluster(RouterPolicyFacade.java:204)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.submitApplication(FederationClientInterceptor.java:362)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.submitApplication(RouterClientRMService.java:218)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:282)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:579)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:297)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>  at com.sun.proxy.$Proxy15.submitApplication(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:288)
>  at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:300)
>  at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:331)
>  at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254)
>  at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
>  at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
>  at 

[jira] [Commented] (YARN-8954) Reservations list field in ReservationListInfo is not accessible

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673518#comment-16673518
 ] 

Hudson commented on YARN-8954:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15349 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15349/])
YARN-8954. Reservations list field in ReservationListInfo is not (gifuma: rev 
babc946d4017e9c385d19a8e6f7f1ecd5080d619)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ReservationListInfo.java


> Reservations list field in ReservationListInfo is not accessible
> 
>
> Key: YARN-8954
> URL: https://issues.apache.org/jira/browse/YARN-8954
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, restapi
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-8954.001.patch
>
>
> We need to add the getter for Reservations list field since the field cannot 
> be accessible after the unmarshal. The similar problem described in YARN-2280.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8897) LoadBasedRouterPolicy throws "NPE" in case of sub cluster unavailability

2018-11-02 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673514#comment-16673514
 ] 

Giovanni Matteo Fumarola commented on YARN-8897:


+1 on  [^YARN-8897-003.patch] .
Committing into trunk.

Thanks [~BilwaST] for the patch, and [~bibinchundatt] for the comments.

> LoadBasedRouterPolicy throws "NPE" in case of sub cluster unavailability 
> -
>
> Key: YARN-8897
> URL: https://issues.apache.org/jira/browse/YARN-8897
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router
>Reporter: Akshay Agarwal
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-8897-001.patch, YARN-8897-002.patch, 
> YARN-8897-003.patch
>
>
> If no sub clusters are available for "*Load Based Router Policy*" with 
> *cluster weight* as *1*  in Router Based Federation Setup , throwing 
> "*NullPointerException*".
>  
> *Exception Details:*
> {code:java}
> java.lang.NullPointerException: java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.federation.policies.router.LoadBasedRouterPolicy.getHomeSubcluster(LoadBasedRouterPolicy.java:99)
>  at 
> org.apache.hadoop.yarn.server.federation.policies.RouterPolicyFacade.getHomeSubcluster(RouterPolicyFacade.java:204)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.submitApplication(FederationClientInterceptor.java:362)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.submitApplication(RouterClientRMService.java:218)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:282)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:579)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:297)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>  at com.sun.proxy.$Proxy15.submitApplication(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:288)
>  at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:300)
>  at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:331)
>  at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254)
>  at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
>  at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> 

[jira] [Commented] (YARN-8954) Reservations list field in ReservationListInfo is not accessible

2018-11-02 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673504#comment-16673504
 ] 

Giovanni Matteo Fumarola commented on YARN-8954:


Thanks [~oshevchenko] for the patch. The get method was missing in YARN-4420. 
No tests needed.

Committing into trunk.

> Reservations list field in ReservationListInfo is not accessible
> 
>
> Key: YARN-8954
> URL: https://issues.apache.org/jira/browse/YARN-8954
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, restapi
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-8954.001.patch
>
>
> We need to add the getter for Reservations list field since the field cannot 
> be accessible after the unmarshal. The similar problem described in YARN-2280.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8954) Reservations list field in ReservationListInfo is not accessible

2018-11-02 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola reassigned YARN-8954:
--

Assignee: Oleksandr Shevchenko

> Reservations list field in ReservationListInfo is not accessible
> 
>
> Key: YARN-8954
> URL: https://issues.apache.org/jira/browse/YARN-8954
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, restapi
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-8954.001.patch
>
>
> We need to add the getter for Reservations list field since the field cannot 
> be accessible after the unmarshal. The similar problem described in YARN-2280.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8954) Reservations list field in ReservationListInfo is not accessible

2018-11-02 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8954:
---
Fix Version/s: 3.3.0

> Reservations list field in ReservationListInfo is not accessible
> 
>
> Key: YARN-8954
> URL: https://issues.apache.org/jira/browse/YARN-8954
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, restapi
>Reporter: Oleksandr Shevchenko
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-8954.001.patch
>
>
> We need to add the getter for Reservations list field since the field cannot 
> be accessible after the unmarshal. The similar problem described in YARN-2280.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-7225) Add queue and partition info to RM audit log

2018-11-02 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung reopened YARN-7225:
-

> Add queue and partition info to RM audit log
> 
>
> Key: YARN-7225
> URL: https://issues.apache.org/jira/browse/YARN-7225
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.9.1, 2.8.4, 3.0.2, 3.1.1
>Reporter: Jonathan Hung
>Assignee: Eric Payne
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 2.8.6, 3.2.1, 2.9.3
>
> Attachments: YARN-7225.001.patch, YARN-7225.002.patch, 
> YARN-7225.003.patch, YARN-7225.004.patch, YARN-7225.005.patch, 
> YARN-7225.branch-2.8.001.patch, YARN-7225.branch-2.8.002.patch, 
> YARN-7225.branch-2.8.003.addendum.patch
>
>
> Right now RM audit log has fields such as user, ip, resource, etc. Having 
> queue and partition  is useful for resource tracking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7225) Add queue and partition info to RM audit log

2018-11-02 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673495#comment-16673495
 ] 

Jonathan Hung commented on YARN-7225:
-

Hey [~eepayne], in the process of backporting the branch-2.8 patch, I 
inadvertently found another issue...seems if we pass queue/partition then we no 
longer print IP. I attached a one line addendum patch , mind giving it a look ? 
Thanks!

> Add queue and partition info to RM audit log
> 
>
> Key: YARN-7225
> URL: https://issues.apache.org/jira/browse/YARN-7225
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.9.1, 2.8.4, 3.0.2, 3.1.1
>Reporter: Jonathan Hung
>Assignee: Eric Payne
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 2.8.6, 3.2.1, 2.9.3
>
> Attachments: YARN-7225.001.patch, YARN-7225.002.patch, 
> YARN-7225.003.patch, YARN-7225.004.patch, YARN-7225.005.patch, 
> YARN-7225.branch-2.8.001.patch, YARN-7225.branch-2.8.002.patch, 
> YARN-7225.branch-2.8.003.addendum.patch
>
>
> Right now RM audit log has fields such as user, ip, resource, etc. Having 
> queue and partition  is useful for resource tracking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7225) Add queue and partition info to RM audit log

2018-11-02 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7225:

Attachment: YARN-7225.branch-2.8.003.addendum.patch

> Add queue and partition info to RM audit log
> 
>
> Key: YARN-7225
> URL: https://issues.apache.org/jira/browse/YARN-7225
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.9.1, 2.8.4, 3.0.2, 3.1.1
>Reporter: Jonathan Hung
>Assignee: Eric Payne
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 2.8.6, 3.2.1, 2.9.3
>
> Attachments: YARN-7225.001.patch, YARN-7225.002.patch, 
> YARN-7225.003.patch, YARN-7225.004.patch, YARN-7225.005.patch, 
> YARN-7225.branch-2.8.001.patch, YARN-7225.branch-2.8.002.patch, 
> YARN-7225.branch-2.8.003.addendum.patch
>
>
> Right now RM audit log has fields such as user, ip, resource, etc. Having 
> queue and partition  is useful for resource tracking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8932) ResourceUtilization cpu is misused in oversubscription as a percentage

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673475#comment-16673475
 ] 

Hadoop QA commented on YARN-8932:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-1011 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
10s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
39s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 2s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} YARN-1011 passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 13m 
25s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} YARN-1011 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  2m 31s{color} 
| {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server generated 16 
new + 86 unchanged - 0 fixed = 102 total (was 86) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 58s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 1 new + 
203 unchanged - 0 fixed = 204 total (was 203) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 12m  
4s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 22s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 23s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 86m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8932 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946290/YARN-8932-YARN-1011.02.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4050669b7330 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | YARN-1011 / f3d08c7 |
| maven | version: 

[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673366#comment-16673366
 ] 

Hadoop QA commented on YARN-8233:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 5 new + 100 unchanged - 0 fixed = 105 total (was 100) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}116m 23s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}184m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSchedulingRequestUpdate
 |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterServiceCapacity |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8233 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946674/YARN-8233.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4bdb90b6738e 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d16d5f7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (YARN-8898) Fix FederationInterceptor#allocate to set application priority in allocateResponse

2018-11-02 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673048#comment-16673048
 ] 

Bibin A Chundatt commented on YARN-8898:


Thank you [~botong] for explanation

Will take a look at YARN-8933. 

> Fix FederationInterceptor#allocate to set application priority in 
> allocateResponse
> --
>
> Key: YARN-8898
> URL: https://issues.apache.org/jira/browse/YARN-8898
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
>
> In case of FederationInterceptor#mergeAllocateResponses skips 
> application_priority in response returned



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673037#comment-16673037
 ] 

Hadoop QA commented on YARN-8948:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 870 unchanged - 1 fixed = 870 total (was 871) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 28s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}162m 36s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8948 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946649/YARN-8948.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 67af53276c3d 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d16d5f7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/22406/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22406/testReport/ |
| Max. process+thread count 

[jira] [Commented] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673035#comment-16673035
 ] 

Bibin A Chundatt commented on YARN-8948:


[~sunilg] Updating the severity . Not required to be a blocker. 

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8948:
---
Priority: Major  (was: Critical)

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8948:
---
Target Version/s:   (was: 3.2.0)

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-02 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673031#comment-16673031
 ] 

Tao Yang commented on YARN-8233:


Attached v2 patch for review.

> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal 
> whose allocatedOrReservedContainer is null
> -
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8233.001.patch, YARN-8233.002.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find 
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from 
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and 
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
>   ContainerAllocationProposal c =
>   request.getFirstAllocatedOrReservedContainer();
>   attemptId =
>   c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
>   .getApplicationAttemptId();   //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in 
> {{CapacityScheduler#createResourceCommitRequest}} and 
> allocatedOrReservedContainer is possibly null in async-scheduling process 
> when node was lost or application was finished (details in 
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
>   // Allocated something
>   List allocations =
>   csAssignment.getAssignmentInformation().getAllocationDetails();
>   if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true),   //possibly null
> getSchedulerContainersToRelease(csAssignment),
> 
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
>   }
> {code}
> I think we should add null check for allocateOrReserveContainer before create 
> allocate/reserve proposals. Besides the allocation process has increase 
> unconfirmed resource of app when creating an allocate assignment, so if this 
> check is null, we should decrease the unconfirmed resource of live app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-02 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8233:
---
Attachment: YARN-8233.002.patch

> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal 
> whose allocatedOrReservedContainer is null
> -
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8233.001.patch, YARN-8233.002.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find 
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from 
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and 
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
>   ContainerAllocationProposal c =
>   request.getFirstAllocatedOrReservedContainer();
>   attemptId =
>   c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
>   .getApplicationAttemptId();   //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in 
> {{CapacityScheduler#createResourceCommitRequest}} and 
> allocatedOrReservedContainer is possibly null in async-scheduling process 
> when node was lost or application was finished (details in 
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
>   // Allocated something
>   List allocations =
>   csAssignment.getAssignmentInformation().getAllocationDetails();
>   if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true),   //possibly null
> getSchedulerContainersToRelease(csAssignment),
> 
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
>   }
> {code}
> I think we should add null check for allocateOrReserveContainer before create 
> allocate/reserve proposals. Besides the allocation process has increase 
> unconfirmed resource of app when creating an allocate assignment, so if this 
> check is null, we should decrease the unconfirmed resource of live app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673027#comment-16673027
 ] 

Wilfred Spiegelenburg commented on YARN-8948:
-

[~bibinchundatt] Thank you for picking this up. I had just started looking at 
this to move the FairScheduler over to the same interface. I have opened a new 
jira for that already and will start with that as soon as this jira is finished.

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8967) Change FairScheduler to use PlacementRule interface

2018-11-02 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-8967:
---

 Summary: Change FairScheduler to use PlacementRule interface
 Key: YARN-8967
 URL: https://issues.apache.org/jira/browse/YARN-8967
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The PlacementRule interface was introduced to be used by all schedulers as per 
YARN-3635. The CapacityScheduler is using it but the FairScheduler is not and 
is using its own rule definition.

YARN-8948 cleans up the implementation and removes the CS references which 
should allow this change to go through.
This would be the first step in using one placement rule engine for both 
schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672961#comment-16672961
 ] 

Sunil Govindan commented on YARN-8948:
--

On this note, this is not an interface level compatibility problem.

Ideally we should have done for all schedulers, however this is not affecting 
the interface, and hence i suggest to mark this as Major and get this in all 
respective versions and next release can take this.

If you feel this is really a blocker to 3.2, kindly let me know.

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672922#comment-16672922
 ] 

Bibin A Chundatt commented on YARN-8948:


[~sunilg]

{code}
cud u pls explain how this is breaking wire compatibility
{code}
Not breaking wire compatability since nothing is transmitted between process.

In jira i mentioned the *interface is supposed be for all schedulers.* shouldnt 
be for capacityscheduler alone. Else next version  might have to change again?? 

{code}
If this is not breaking any compatibility, i would like to degrade severity to 
Minor so that 3.2 release can go on.
{code}
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html
 
Since we havnt marked {{PlacementRule}} as public interface level compatability 
might not be requried. 

YARN-8016 made the placementRule refined and made it usable.

Another issue seems currently UserGroup*Rule is added always and the order is 
not maintained for {{distingushRuleSet}} custom configuration will not work as 
expected. Consider as functional issue.

Lets take a call based on response from [~leftnoteasy] and [~suma.shivaprasad] 
too.

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672883#comment-16672883
 ] 

Sunil Govindan commented on YARN-8948:
--

[~bibinchundatt]

Few questions:
 # cud u pls explain how this is breaking wire compatibility b/w old hadoop 
version to new one?
 # These jiras are already in hadoop-3.1 and other released? Hence is this 
already broken by these version?
 # any workarounds if u upgrade to hadoop-3.2 and then use old clients/apps by 
which we ll not have any breakage?

If this is not breaking any compatibility, i would like to degrade severity to 
Minor so that 3.2 release can go on.

cc [~leftnoteasy] [~suma.shivaprasad]

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8958) Schedulable entities leak in fair ordering policy when recovering containers between remove app attempt and remove app

2018-11-02 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672875#comment-16672875
 ] 

Tao Yang commented on YARN-8958:


{quote}
I am not sure about that... The cached usage is used by the FairComparator to 
determine the ordering of schedulable entities, we need to make sure that is 
updated correctly.
{quote}
Inside this update is refreshing its cached pending/used to be its current 
pending/used, that's why it can't be updated incorrectly.
{quote}
so we don't need to change #reorderScheduleEntities logic, doesn't that make 
sense?
{quote}
I think this change can solve a lot but not all. It's possible in a race 
condition scenario that schedulable entity did exist when put it to 
entitiesToReorder but removed by other thread immediately. For example:
(1) Thread1 -> AbstractComparatorOrderingPolicy#removeSchedulableEntity 
synchronized block just execute done but haven't removed this schedulable 
entity from schedulableEntities
(2) Thread2 -> AbstractComparatorOrderingPolicy#entityRequiresReordering 
synchronized block execute done: put schedulable entity back to 
entitiesToReorder
(3) Thread1 -> AbstractComparatorOrderingPolicy#removeSchedulableEntity removed 
this schedulable entity from schedulableEntities
Thoughts?

> Schedulable entities leak in fair ordering policy when recovering containers 
> between remove app attempt and remove app
> --
>
> Key: YARN-8958
> URL: https://issues.apache.org/jira/browse/YARN-8958
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8958.001.patch, YARN-8958.002.patch
>
>
> We found a NPE in ClientRMService#getApplications when querying apps with 
> specified queue. The cause is that there is one app which can't be found by 
> calling RMContextImpl#getRMApps(is finished and swapped out of memory) but 
> still can be queried from fair ordering policy.
> To reproduce schedulable entities leak in fair ordering policy:
> (1) create app1 and launch container1 on node1
> (2) restart RM
> (3) remove app1 attempt, app1 is removed from the schedulable entities.
> (4) recover container1 after node1 reconnected to RM, then the state of 
> contianer1 is changed to COMPLETED, app1 is bring back to entitiesToReorder 
> after container released, then app1 will be added back into schedulable 
> entities after calling FairOrderingPolicy#getAssignmentIterator by scheduler.
> (5) remove app1
> To solve this problem, we should make sure schedulableEntities can only be 
> affected by add or remove app attempt, new entity should not be added into 
> schedulableEntities by reordering process.
> {code:java}
>   protected void reorderSchedulableEntity(S schedulableEntity) {
> //remove, update comparable data, and reinsert to update position in order
> schedulableEntities.remove(schedulableEntity);
> updateSchedulingResourceUsage(
>   schedulableEntity.getSchedulingResourceUsage());
> schedulableEntities.add(schedulableEntity);
>   }
> {code}
> Related codes above can be improved as follow to make sure only existent 
> entity can be re-add into schedulableEntities.
> {code:java}
>   protected void reorderSchedulableEntity(S schedulableEntity) {
> //remove, update comparable data, and reinsert to update position in order
> boolean exists = schedulableEntities.remove(schedulableEntity);
> updateSchedulingResourceUsage(
>   schedulableEntity.getSchedulingResourceUsage());
> if (exists) {
>   schedulableEntities.add(schedulableEntity);
> } else {
>   LOG.info("Skip reordering non-existent schedulable entity: "
>   + schedulableEntity.getId());
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-02 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672853#comment-16672853
 ] 

Weiwei Yang commented on YARN-8233:
---

Thanks [~Tao Yang], sounds good. 

> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal 
> whose allocatedOrReservedContainer is null
> -
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8233.001.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find 
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from 
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and 
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
>   ContainerAllocationProposal c =
>   request.getFirstAllocatedOrReservedContainer();
>   attemptId =
>   c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
>   .getApplicationAttemptId();   //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in 
> {{CapacityScheduler#createResourceCommitRequest}} and 
> allocatedOrReservedContainer is possibly null in async-scheduling process 
> when node was lost or application was finished (details in 
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
>   // Allocated something
>   List allocations =
>   csAssignment.getAssignmentInformation().getAllocationDetails();
>   if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true),   //possibly null
> getSchedulerContainersToRelease(csAssignment),
> 
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
>   }
> {code}
> I think we should add null check for allocateOrReserveContainer before create 
> allocate/reserve proposals. Besides the allocation process has increase 
> unconfirmed resource of app when creating an allocate assignment, so if this 
> check is null, we should decrease the unconfirmed resource of live app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-02 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672835#comment-16672835
 ] 

Tao Yang commented on YARN-8233:


Hi, [~surmountian], [~cheersyang]
{quote}I also met this issue in Hadoop 2.9, would you also patch for Hadoop 2.9?
 Maybe some unit test cases could be added to cover the logic.
{quote}
Yes, I would like to update this patch with test case then backport it to 2.9
{quote}does this issue only happen when async-scheduling is enabled?
{quote}
I think so. The sync-scheduling process will hold write lock of 
CapacityScheduler and make sure the return of 
CapacityScheduler#getSchedulerContainer is not null.
{quote}Regarding to the patch, CapacityScheduler#getSchedulerContainer was 
being called multiple places, I think we need to add null check for all of them.
{quote}
Thanks for reminding this, CapacityScheduler#getSchedulerContainersToRelease 
seems should add null check too.

> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal 
> whose allocatedOrReservedContainer is null
> -
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8233.001.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find 
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from 
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and 
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
>   ContainerAllocationProposal c =
>   request.getFirstAllocatedOrReservedContainer();
>   attemptId =
>   c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
>   .getApplicationAttemptId();   //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in 
> {{CapacityScheduler#createResourceCommitRequest}} and 
> allocatedOrReservedContainer is possibly null in async-scheduling process 
> when node was lost or application was finished (details in 
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
>   // Allocated something
>   List allocations =
>   csAssignment.getAssignmentInformation().getAllocationDetails();
>   if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true),   //possibly null
> getSchedulerContainersToRelease(csAssignment),
> 
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
>   }
> {code}
> I think we should add null check for allocateOrReserveContainer before create 
> allocate/reserve proposals. Besides the allocation process has increase 
> unconfirmed resource of app when creating an allocate assignment, so if this 
> check is null, we should decrease the unconfirmed resource of live app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672826#comment-16672826
 ] 

Hadoop QA commented on YARN-8233:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 57s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 0s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}162m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8233 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12921116/YARN-8233.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b4d89e094499 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d16d5f7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/22405/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22405/testReport/ |
| Max. process+thread count | 956 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-8958) Schedulable entities leak in fair ordering policy when recovering containers between remove app attempt and remove app

2018-11-02 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672817#comment-16672817
 ] 

Weiwei Yang commented on YARN-8958:
---

Hi [~Tao Yang]
{quote}This calling can make schedulable entity more correctly because inside 
it just cached resource usage of itself get fresher and no harm to others.
{quote}
I am not sure about that... The cached usage is used by the FairComparator to 
determine the ordering of schedulable entities, we need to make sure that is 
updated correctly. Go back to the fix, I come up with an alternative approach:
 # {{schedulableEntities}} mains the full list of apps for the ordering policy, 
entities are added/removed when app added, removed or updated;
 # {{entitiesToReorder}} maintains the apps that needs to re-order, entities 
are added/removed when container allocated, released or updated (resource usage 
changes)

Under such context, {{entitiesToReorder}} should be a sub-set of 
{{schedulableEntities}}. So why not we ensure that by following change?
{code:java}
protected void entityRequiresReordering(S schedulableEntity) {
  synchronized (entitiesToReorder) {
    if (schedulableEntities.contains(schedulableEntity)) {
       entitiesToReorder.put(schedulableEntity.getId(), schedulableEntity);
    }
  }
}
{code}
so we don't need to change #reorderScheduleEntities logic, doesn't that make 
sense?

 

> Schedulable entities leak in fair ordering policy when recovering containers 
> between remove app attempt and remove app
> --
>
> Key: YARN-8958
> URL: https://issues.apache.org/jira/browse/YARN-8958
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8958.001.patch, YARN-8958.002.patch
>
>
> We found a NPE in ClientRMService#getApplications when querying apps with 
> specified queue. The cause is that there is one app which can't be found by 
> calling RMContextImpl#getRMApps(is finished and swapped out of memory) but 
> still can be queried from fair ordering policy.
> To reproduce schedulable entities leak in fair ordering policy:
> (1) create app1 and launch container1 on node1
> (2) restart RM
> (3) remove app1 attempt, app1 is removed from the schedulable entities.
> (4) recover container1 after node1 reconnected to RM, then the state of 
> contianer1 is changed to COMPLETED, app1 is bring back to entitiesToReorder 
> after container released, then app1 will be added back into schedulable 
> entities after calling FairOrderingPolicy#getAssignmentIterator by scheduler.
> (5) remove app1
> To solve this problem, we should make sure schedulableEntities can only be 
> affected by add or remove app attempt, new entity should not be added into 
> schedulableEntities by reordering process.
> {code:java}
>   protected void reorderSchedulableEntity(S schedulableEntity) {
> //remove, update comparable data, and reinsert to update position in order
> schedulableEntities.remove(schedulableEntity);
> updateSchedulingResourceUsage(
>   schedulableEntity.getSchedulingResourceUsage());
> schedulableEntities.add(schedulableEntity);
>   }
> {code}
> Related codes above can be improved as follow to make sure only existent 
> entity can be re-add into schedulableEntities.
> {code:java}
>   protected void reorderSchedulableEntity(S schedulableEntity) {
> //remove, update comparable data, and reinsert to update position in order
> boolean exists = schedulableEntities.remove(schedulableEntity);
> updateSchedulingResourceUsage(
>   schedulableEntity.getSchedulingResourceUsage());
> if (exists) {
>   schedulableEntities.add(schedulableEntity);
> } else {
>   LOG.info("Skip reordering non-existent schedulable entity: "
>   + schedulableEntity.getId());
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-02 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672797#comment-16672797
 ] 

Weiwei Yang commented on YARN-8233:
---

[~Tao Yang], [~surmountian], does this issue only happen when async-scheduling 
is enabled?

Regarding to the patch, {{CapacityScheduler#getSchedulerContainer}} was being 
called multiple places, I think we need to add null check for all of them.

 

> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal 
> whose allocatedOrReservedContainer is null
> -
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8233.001.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find 
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from 
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and 
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
>   ContainerAllocationProposal c =
>   request.getFirstAllocatedOrReservedContainer();
>   attemptId =
>   c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
>   .getApplicationAttemptId();   //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in 
> {{CapacityScheduler#createResourceCommitRequest}} and 
> allocatedOrReservedContainer is possibly null in async-scheduling process 
> when node was lost or application was finished (details in 
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
>   // Allocated something
>   List allocations =
>   csAssignment.getAssignmentInformation().getAllocationDetails();
>   if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true),   //possibly null
> getSchedulerContainersToRelease(csAssignment),
> 
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
>   }
> {code}
> I think we should add null check for allocateOrReserveContainer before create 
> allocate/reserve proposals. Besides the allocation process has increase 
> unconfirmed resource of app when creating an allocate assignment, so if this 
> check is null, we should decrease the unconfirmed resource of live app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8686) Queue Management API - not returning JSON or XML response data when passing Accept header

2018-11-02 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB reassigned YARN-8686:
--

Assignee: Akhil PB

> Queue Management API - not returning JSON or XML response data when passing 
> Accept header
> -
>
> Key: YARN-8686
> URL: https://issues.apache.org/jira/browse/YARN-8686
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
>
> API should return JSON or XML response data based on Accept header. Instead, 
> API returns plain text for success as well as error scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8925) Updating distributed node attributes only when necessary

2018-11-02 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672769#comment-16672769
 ] 

Tao Yang edited comment on YARN-8925 at 11/2/18 8:41 AM:
-

Hi, [~cheersyang]. I have attached V2 patch for review, this patch is not 
finished yet since some test cases are still needed, can you help to take a 
simple review and give your feedback about this patch? 
V2 patch supports reporting node attributes only when updated or re-sync time 
is elapsed by NM to avoid overhead of the comparing in every heartbeat process. 
 As your advice, logic is similar as reporting node labels. Common logic such 
as judging whether labels/attributes should be send to RM is extracted as 
HeartbeatSyncIfNeededHandler, so that handlers of labels and attributes can 
extend it to reuse the logic. For test cases, I have created 
TestNodeStatusUpdaterForAttributes to test NM side and will add some new test 
methods in TestResourceTrackerService to test RM side (similar to node labels 
test cases in TestResourceTrackerService).  Any problems about that? 
Looking forward to your feedback. Thanks !


was (Author: tao yang):
Hi, [~cheersyang]. I have attached V2 patch for review, this patch is not 
finished yet since some test cases are still needed, can you help to take a 
simple review and give your feedback about this patch? 
V2 patch supports reporting node attributes only when updated or re-sync time 
is elapsed by NM to avoid overhead of the comparing in every heartbeat process. 
 As your advice, logic is similar as reporting node labels. Common logic such 
as judging whether labels/attributes should be send to RM is extracted as 
HeartbeatSyncIfNeededHandler, so that handlers of labels and attributes can 
extend it to reuse the logic. For test cases, I create 
TestNodeStatusUpdaterForAttributes to test NM side and will add some new test 
methods in TestResourceTrackerService to test RM side (similar to node labels 
test cases in TestResourceTrackerService).  Thoughts? 

> Updating distributed node attributes only when necessary
> 
>
> Key: YARN-8925
> URL: https://issues.apache.org/jira/browse/YARN-8925
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: performance
> Attachments: YARN-8925.001.patch, YARN-8925.002.patch
>
>
> Currently if distributed node attributes exist, even though there is no 
> change, updating for distributed node attributes will happen in every 
> heartbeat between NM and RM. Updating process will hold 
> NodeAttributesManagerImpl#writeLock and may have some influence in a large 
> cluster. We have found nodes UI of a large cluster is opened slowly and most 
> time it's waiting for the lock in NodeAttributesManagerImpl. I think this 
> updating should be called only when necessary to enhance the performance of 
> related process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8925) Updating distributed node attributes only when necessary

2018-11-02 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672769#comment-16672769
 ] 

Tao Yang commented on YARN-8925:


Hi, [~cheersyang]. I have attached V2 patch for review, this patch is not 
finished yet since some test cases are still needed, can you help to take a 
simple review and give your feedback about this patch? 
V2 patch supports reporting node attributes only when updated or re-sync time 
is elapsed by NM to avoid overhead of the comparing in every heartbeat process. 
 As your advice, logic is similar as reporting node labels. Common logic such 
as judging whether labels/attributes should be send to RM is extracted as 
HeartbeatSyncIfNeededHandler, so that handlers of labels and attributes can 
extend it to reuse the logic. For test cases, I create 
TestNodeStatusUpdaterForAttributes to test NM side and will add some new test 
methods in TestResourceTrackerService to test RM side (similar to node labels 
test cases in TestResourceTrackerService).  Thoughts? 

> Updating distributed node attributes only when necessary
> 
>
> Key: YARN-8925
> URL: https://issues.apache.org/jira/browse/YARN-8925
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: performance
> Attachments: YARN-8925.001.patch, YARN-8925.002.patch
>
>
> Currently if distributed node attributes exist, even though there is no 
> change, updating for distributed node attributes will happen in every 
> heartbeat between NM and RM. Updating process will hold 
> NodeAttributesManagerImpl#writeLock and may have some influence in a large 
> cluster. We have found nodes UI of a large cluster is opened slowly and most 
> time it's waiting for the lock in NodeAttributesManagerImpl. I think this 
> updating should be called only when necessary to enhance the performance of 
> related process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reassigned YARN-8948:
--

Assignee: Bibin A Chundatt

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8925) Updating distributed node attributes only when necessary

2018-11-02 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8925:
---
Attachment: YARN-8925.002.patch

> Updating distributed node attributes only when necessary
> 
>
> Key: YARN-8925
> URL: https://issues.apache.org/jira/browse/YARN-8925
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: performance
> Attachments: YARN-8925.001.patch, YARN-8925.002.patch
>
>
> Currently if distributed node attributes exist, even though there is no 
> change, updating for distributed node attributes will happen in every 
> heartbeat between NM and RM. Updating process will hold 
> NodeAttributesManagerImpl#writeLock and may have some influence in a large 
> cluster. We have found nodes UI of a large cluster is opened slowly and most 
> time it's waiting for the lock in NodeAttributesManagerImpl. I think this 
> updating should be called only when necessary to enhance the performance of 
> related process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8948:
---
Attachment: YARN-8948.003.patch

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8948:
---
Attachment: YARN-8948.003.patch

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8948:
---
Attachment: (was: YARN-8948.003.patch)

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672686#comment-16672686
 ] 

Bibin A Chundatt commented on YARN-8948:


[~sunil.gov...@gmail.com]

Could you look into this issue..

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2018-11-02 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8948:
---
Description: 
*Issue 1:*

YARN-3635 intention was to add PlacementRule interface common for all 
YarnSchedules.
{code}
33public abstract boolean initialize(
34CapacitySchedulerContext schedulerContext) throws IOException;
{code}
PlacementRule initialization is done using CapacitySchedulerContext binding to 
CapacityScheduler

*Issue 2:*

{{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
Scheduler
{quote}
* **Queue Mapping Interface based on Default or User Defined Placement Rules** 
- This feature allows users to map a job to a specific queue based on some 
default placement rule. For instance based on user & group, or application 
name. User can also define their own placement rule.
{quote}
As per current UserGroupMapping is always added in placementRule. 
{{CapacityScheduler#updatePlacementRules}}
{code}
// Initialize placement rules
Collection placementRuleStrs = conf.getStringCollection(
YarnConfiguration.QUEUE_PLACEMENT_RULES);
List placementRules = new ArrayList<>();
...

// add UserGroupMappingPlacementRule if absent
distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
{code}
PlacementRule configuration order is not maintained 

  was:
*Issue 1:*

YARN-3635 intention was to add PlacementRule interface common for all 
YarnSchedules.
{code}
33public abstract boolean initialize(
34CapacitySchedulerContext schedulerContext) throws IOException;
{code}
PlacementRule initialization is done using CapacitySchedulerContext binding to 
CapacityScheduler

*Issue 2:*

{{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
Scheduler
{quote}
* **Queue Mapping Interface based on Default or User Defined Placement Rules** 
- This feature allows users to map a job to a specific queue based on some 
default placement rule. For instance based on user & group, or application 
name. User can also define their own placement rule.
{quote}
As per current UserGroupMapping is always added in placementRule. 
{{CapacityScheduler#updatePlacementRules}}
{code}
// Initialize placement rules
Collection placementRuleStrs = conf.getStringCollection(
YarnConfiguration.QUEUE_PLACEMENT_RULES);
List placementRules = new ArrayList<>();
...

// add UserGroupMappingPlacementRule if absent
distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
{code}


> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler
> *Issue 2:*
> {{yarn.scheduler.queue-placement-rules}} doesn't work as expected in Capacity 
> Scheduler
> {quote}
> * **Queue Mapping Interface based on Default or User Defined Placement 
> Rules** - This feature allows users to map a job to a specific queue based on 
> some default placement rule. For instance based on user & group, or 
> application name. User can also define their own placement rule.
> {quote}
> As per current UserGroupMapping is always added in placementRule. 
> {{CapacityScheduler#updatePlacementRules}}
> {code}
> // Initialize placement rules
> Collection placementRuleStrs = conf.getStringCollection(
> YarnConfiguration.QUEUE_PLACEMENT_RULES);
> List placementRules = new ArrayList<>();
> ...
> // add UserGroupMappingPlacementRule if absent
> distingushRuleSet.add(YarnConfiguration.USER_GROUP_PLACEMENT_RULE);
> {code}
> PlacementRule configuration order is not maintained 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8686) Queue Management API - not returning JSON or XML response data when passing Accept header

2018-11-02 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB resolved YARN-8686.

Resolution: Won't Fix

Changing the output of restful API might be incompatible to old clients.

> Queue Management API - not returning JSON or XML response data when passing 
> Accept header
> -
>
> Key: YARN-8686
> URL: https://issues.apache.org/jira/browse/YARN-8686
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Akhil PB
>Priority: Major
>
> API should return JSON or XML response data based on Accept header. Instead, 
> API returns plain text for success as well as error scenarios.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8905) [Router] Add JvmMetricsInfo and pause monitor

2018-11-02 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672659#comment-16672659
 ] 

Bibin A Chundatt commented on YARN-8905:


+1 latest patch looks good to me.

Will commit by tomorrow if no objections.

> [Router] Add JvmMetricsInfo and pause monitor
> -
>
> Key: YARN-8905
> URL: https://issues.apache.org/jira/browse/YARN-8905
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-8905-001.patch, YARN-8905-002.patch, 
> YARN-8905-003.patch, YARN-8905-004.patch, YARN-8905-005.patch
>
>
> Similar to resourcemanager and nodemanager serivce we can add JvmMetricsInfo 
> to router service too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-02 Thread Xiao Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672650#comment-16672650
 ] 

Xiao Liang commented on YARN-8233:
--

Thanks [~Tao Yang], I also met this issue in Hadoop 2.9, would you also patch 
for Hadoop 2.9?

Maybe some unit test cases could be added to cover the logic.

> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal 
> whose allocatedOrReservedContainer is null
> -
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8233.001.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find 
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from 
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and 
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
>   ContainerAllocationProposal c =
>   request.getFirstAllocatedOrReservedContainer();
>   attemptId =
>   c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
>   .getApplicationAttemptId();   //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in 
> {{CapacityScheduler#createResourceCommitRequest}} and 
> allocatedOrReservedContainer is possibly null in async-scheduling process 
> when node was lost or application was finished (details in 
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
>   // Allocated something
>   List allocations =
>   csAssignment.getAssignmentInformation().getAllocationDetails();
>   if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true),   //possibly null
> getSchedulerContainersToRelease(csAssignment),
> 
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
>   }
> {code}
> I think we should add null check for allocateOrReserveContainer before create 
> allocate/reserve proposals. Besides the allocation process has increase 
> unconfirmed resource of app when creating an allocate assignment, so if this 
> check is null, we should decrease the unconfirmed resource of live app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8867) Retrieve the status of resource localization

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672619#comment-16672619
 ] 

Hadoop QA commented on YARN-8867:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  6m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 15m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 29s{color} | {color:orange} root: The patch generated 26 new + 481 unchanged 
- 0 fixed = 507 total (was 481) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
26s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  9m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
51s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
38s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
40s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 22s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}114m 45s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 27m 
45s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m  
2s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
18s{color} | 

[jira] [Commented] (YARN-8958) Schedulable entities leak in fair ordering policy when recovering containers between remove app attempt and remove app

2018-11-02 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672606#comment-16672606
 ] 

Tao Yang commented on YARN-8958:


Hi, [~cheersyang], thanks for the review.
{quote}
Can we also only do the resource usage updates when the schedulable entity 
exists?
otherwise, would the resource usage gets incorrectly updated?
{quote}
I think it's OK to update resource usage for non-existent schedulable entity, 
maybe this schedulable entity is not finished but moved from ordering-policy to 
pending-ordering-policy, it may need this update. This calling can make 
schedulable entity more correctly because inside it just cached resource usage 
of itself get fresher and no harm to others.

> Schedulable entities leak in fair ordering policy when recovering containers 
> between remove app attempt and remove app
> --
>
> Key: YARN-8958
> URL: https://issues.apache.org/jira/browse/YARN-8958
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8958.001.patch, YARN-8958.002.patch
>
>
> We found a NPE in ClientRMService#getApplications when querying apps with 
> specified queue. The cause is that there is one app which can't be found by 
> calling RMContextImpl#getRMApps(is finished and swapped out of memory) but 
> still can be queried from fair ordering policy.
> To reproduce schedulable entities leak in fair ordering policy:
> (1) create app1 and launch container1 on node1
> (2) restart RM
> (3) remove app1 attempt, app1 is removed from the schedulable entities.
> (4) recover container1 after node1 reconnected to RM, then the state of 
> contianer1 is changed to COMPLETED, app1 is bring back to entitiesToReorder 
> after container released, then app1 will be added back into schedulable 
> entities after calling FairOrderingPolicy#getAssignmentIterator by scheduler.
> (5) remove app1
> To solve this problem, we should make sure schedulableEntities can only be 
> affected by add or remove app attempt, new entity should not be added into 
> schedulableEntities by reordering process.
> {code:java}
>   protected void reorderSchedulableEntity(S schedulableEntity) {
> //remove, update comparable data, and reinsert to update position in order
> schedulableEntities.remove(schedulableEntity);
> updateSchedulingResourceUsage(
>   schedulableEntity.getSchedulingResourceUsage());
> schedulableEntities.add(schedulableEntity);
>   }
> {code}
> Related codes above can be improved as follow to make sure only existent 
> entity can be re-add into schedulableEntities.
> {code:java}
>   protected void reorderSchedulableEntity(S schedulableEntity) {
> //remove, update comparable data, and reinsert to update position in order
> boolean exists = schedulableEntities.remove(schedulableEntity);
> updateSchedulingResourceUsage(
>   schedulableEntity.getSchedulingResourceUsage());
> if (exists) {
>   schedulableEntities.add(schedulableEntity);
> } else {
>   LOG.info("Skip reordering non-existent schedulable entity: "
>   + schedulableEntity.getId());
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org