[jira] [Updated] (YARN-10416) Typos in YarnScheduler#allocate method's doc comment

2020-09-01 Thread Siddharth Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Ahuja updated YARN-10416:
---
Attachment: (was: YARN-10416.001.patch)

> Typos in YarnScheduler#allocate method's doc comment
> 
>
> Key: YARN-10416
> URL: https://issues.apache.org/jira/browse/YARN-10416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: docs
>Reporter: Wanqiang Ji
>Assignee: Siddharth Ahuja
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-10416.001.patch
>
>
> {code:java}
> /**
>  * The main api between the ApplicationMaster and the Scheduler.
>  * The ApplicationMaster is updating his future resource requirements
>  * and may release containers he doens't need.
>  */
> {code}
>  
> doens't correct to doesn't



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10416) Typos in YarnScheduler#allocate method's doc comment

2020-09-01 Thread Siddharth Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Ahuja updated YARN-10416:
---
Attachment: YARN-10416.001.patch

> Typos in YarnScheduler#allocate method's doc comment
> 
>
> Key: YARN-10416
> URL: https://issues.apache.org/jira/browse/YARN-10416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: docs
>Reporter: Wanqiang Ji
>Assignee: Siddharth Ahuja
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-10416.001.patch
>
>
> {code:java}
> /**
>  * The main api between the ApplicationMaster and the Scheduler.
>  * The ApplicationMaster is updating his future resource requirements
>  * and may release containers he doens't need.
>  */
> {code}
>  
> doens't correct to doesn't



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10416) Typos in YarnScheduler#allocate method's doc comment

2020-09-01 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188991#comment-17188991
 ] 

Siddharth Ahuja commented on YARN-10416:


* Fixed up the overall method description,
* Added explanation for individual params,
* The {{updateRequests}} param's explanation was incorrectly set to the return 
type:
{code}
updateRequests - @return the Allocation for the application
{code}
Fixed this so that updateRequests has its own explanation and the return type 
is moved on to its own line.

> Typos in YarnScheduler#allocate method's doc comment
> 
>
> Key: YARN-10416
> URL: https://issues.apache.org/jira/browse/YARN-10416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: docs
>Reporter: Wanqiang Ji
>Assignee: Siddharth Ahuja
>Priority: Minor
>  Labels: newbie
>
> {code:java}
> /**
>  * The main api between the ApplicationMaster and the Scheduler.
>  * The ApplicationMaster is updating his future resource requirements
>  * and may release containers he doens't need.
>  */
> {code}
>  
> doens't correct to doesn't



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10416) Typos in YarnScheduler#allocate method's doc comment

2020-09-01 Thread Siddharth Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Ahuja reassigned YARN-10416:
--

Assignee: Siddharth Ahuja

> Typos in YarnScheduler#allocate method's doc comment
> 
>
> Key: YARN-10416
> URL: https://issues.apache.org/jira/browse/YARN-10416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: docs
>Reporter: Wanqiang Ji
>Assignee: Siddharth Ahuja
>Priority: Minor
>  Labels: newbie
>
> {code:java}
> /**
>  * The main api between the ApplicationMaster and the Scheduler.
>  * The ApplicationMaster is updating his future resource requirements
>  * and may release containers he doens't need.
>  */
> {code}
>  
> doens't correct to doesn't



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10419) Javadoc error in hadoop-yarn-server-common module

2020-09-01 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188928#comment-17188928
 ] 

Akira Ajisaka commented on YARN-10419:
--

This is broken by YARN-10304, however, the precommit job passed. I'm not sure 
why it succeeded.
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/80/artifact/out/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt

> Javadoc error in hadoop-yarn-server-common module
> -
>
> Key: YARN-10419
> URL: https://issues.apache.org/jira/browse/YARN-10419
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, documentation
>Reporter: Akira Ajisaka
>Priority: Major
>
> {noformat}
> $ mvn clean process-sources javadoc:javadoc-no-fork -pl 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
> (snip)
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/RemoteLogPathEntry.java:23:
>  error: unknown tag: ROOT_PATH
> [ERROR]  *   /%USER/
> [ERROR]  ^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/RemoteLogPathEntry.java:23:
>  error: unknown tag: SUFFIX
> [ERROR]  *   /%USER/
> {noformat}
> Full log: https://gist.github.com/aajisaka/46fde3cbd9211fc09ee4040b85251e9c



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10419) Javadoc error in hadoop-yarn-server-common module

2020-09-01 Thread Akira Ajisaka (Jira)
Akira Ajisaka created YARN-10419:


 Summary: Javadoc error in hadoop-yarn-server-common module
 Key: YARN-10419
 URL: https://issues.apache.org/jira/browse/YARN-10419
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build, documentation
Reporter: Akira Ajisaka


{noformat}
$ mvn clean process-sources javadoc:javadoc-no-fork -pl 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
(snip)
[ERROR] 
/Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/RemoteLogPathEntry.java:23:
 error: unknown tag: ROOT_PATH
[ERROR]  *   /%USER/
[ERROR]  ^
[ERROR] 
/Users/aajisaka/git/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/RemoteLogPathEntry.java:23:
 error: unknown tag: SUFFIX
[ERROR]  *   /%USER/
{noformat}
Full log: https://gist.github.com/aajisaka/46fde3cbd9211fc09ee4040b85251e9c



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10372) Create MappingRule class to represent each CS mapping rule

2020-09-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188773#comment-17188773
 ] 

Peter Bacsko edited comment on YARN-10372 at 9/1/20, 7:38 PM:
--

[~shuzirra] please fix the indentation-related warnings.

Also please do the following:
1. In {{createLegacyRule()}}, it is important to call 
{{action.setFallbackDefaultPlacement()}} to ensure proper backward compatibility
2. Add javadoc to {{createLegacyRule()}} methods to describe what "legacy" 
means in this context
3. Consider extending the unit tests which verifies the fallback setting


was (Author: pbacsko):
[~shuzirra] please fix the indentation-related warnings.

Also, in {{createLegacyRule()}}, it is important to call 
{{action.setFallbackDefaultPlacement()}} to ensure proper backward 
compatibility.

> Create MappingRule class to represent each CS mapping rule
> --
>
> Key: YARN-10372
> URL: https://issues.apache.org/jira/browse/YARN-10372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10372.001.patch, YARN-10372.002.patch
>
>
> As per the design document attached to the umbrella Jira (YARN-10370), we 
> need a class which represents a mapping rule, which can be evaluated, and 
> determines if the rule applies to the current application placement and 
> determines the action to be taken.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10372) Create MappingRule class to represent each CS mapping rule

2020-09-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188773#comment-17188773
 ] 

Peter Bacsko edited comment on YARN-10372 at 9/1/20, 7:35 PM:
--

[~shuzirra] please fix the indentation-related warnings.

Also, in {{createLegacyRule()}}, it is important to call 
{{action.setFallbackDefaultPlacement()}} to ensure proper backward 
compatibility.


was (Author: pbacsko):
[~shuzirra] please fix the indentation-related warnings.

> Create MappingRule class to represent each CS mapping rule
> --
>
> Key: YARN-10372
> URL: https://issues.apache.org/jira/browse/YARN-10372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10372.001.patch, YARN-10372.002.patch
>
>
> As per the design document attached to the umbrella Jira (YARN-10370), we 
> need a class which represents a mapping rule, which can be evaluated, and 
> determines if the rule applies to the current application placement and 
> determines the action to be taken.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10372) Create MappingRule class to represent each CS mapping rule

2020-09-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188773#comment-17188773
 ] 

Peter Bacsko commented on YARN-10372:
-

[~shuzirra] please fix the indentation-related warnings.

> Create MappingRule class to represent each CS mapping rule
> --
>
> Key: YARN-10372
> URL: https://issues.apache.org/jira/browse/YARN-10372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10372.001.patch, YARN-10372.002.patch
>
>
> As per the design document attached to the umbrella Jira (YARN-10370), we 
> need a class which represents a mapping rule, which can be evaluated, and 
> determines if the rule applies to the current application placement and 
> determines the action to be taken.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10372) Create MappingRule class to represent each CS mapping rule

2020-09-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188723#comment-17188723
 ] 

Hadoop QA commented on YARN-10372:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
44s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 8 new + 0 unchanged - 0 fixed = 8 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m  7s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| 

[jira] [Commented] (YARN-10375) CS Mapping rule config parser should return MappingRule objects

2020-09-01 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188632#comment-17188632
 ] 

Szilard Nemeth commented on YARN-10375:
---

Hi [~shuzirra],
Patch LGTM +1, let's wait for the other dependant patches to be merged.

> CS Mapping rule config parser should return MappingRule objects
> ---
>
> Key: YARN-10375
> URL: https://issues.apache.org/jira/browse/YARN-10375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10375.001.patch
>
>
> As per the design document attached to the umbrella Jira (YARN-10370), we 
> need to change the current configuration parser to create the new object 
> structure when parsing the current mapping rule format. This should be 100% 
> backwards compatible, to make sure upon upgrade all mapping rules will behave 
> the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10372) Create MappingRule class to represent each CS mapping rule

2020-09-01 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188605#comment-17188605
 ] 

Szilard Nemeth commented on YARN-10372:
---

Hi [~shuzirra],
Thanks for this patch. LGTM +1

> Create MappingRule class to represent each CS mapping rule
> --
>
> Key: YARN-10372
> URL: https://issues.apache.org/jira/browse/YARN-10372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10372.001.patch, YARN-10372.002.patch
>
>
> As per the design document attached to the umbrella Jira (YARN-10370), we 
> need a class which represents a mapping rule, which can be evaluated, and 
> determines if the rule applies to the current application placement and 
> determines the action to be taken.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10374) Create Actions for CS mapping rules

2020-09-01 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188597#comment-17188597
 ] 

Szilard Nemeth commented on YARN-10374:
---

Hi [~shuzirra],

Some comments: 

1. MappingQueuePath#getFullPath: I don't know how frequently this method is 
intended to be called but it might be a good idea to not construct the full 
path string on the fly with: 
parent + DOT + leaf
Wouldn't it be good to store the full path as a field instead?

2. MappingQueuePath#setFromFullPath: I have a question on the if condition, a 
string like ".someleaf" is also accepted because the lastDotIdx > 1 condition. 
I think we should handle this case as well and forbid input strings like this, 
but I might be wrong here.

3. Javadoc of MappingRuleResult#queue can be a bit more specific: Can you 
please link the MappingRuleResultType#PLACE instead of simply using PLACE in 
the javadoc? 
This comment applies to other field like RESULT_REJECT, RESULT_SKIP and 
MappingRuleResult#getQueue as well.

4. Nit: MappingRuleResult#toString: String.format(..) would be way more 
readable.

5. MappingRuleActions.PlaceToQueueAction#validate: typo in "calass"

6. MappingRuleActions.RejectAction: Javadoc seems weird: "on the user's side 
letting it know", either use "them" or something else but not "it".

7. MappingRuleActions.RejectAction#execute: Typo in MappingRuleResut

8. Nit: MappingRuleActions.VariableUpdateAction#VariableUpdateAction: Javadoc 
for parameter variableValue is missing.

9. Typos in javadoc of MappingRuleActions.VariableUpdateAction#execute: 
"evalutaion", "exectute"

10. Nit: TestMappingRuleActions: Assertxxx methods in the beginning can be 
private

11. Nit: MappingRuleValidationContextImpl#validateStaticQueuePath: Can you 
extract the code block belonging to the if (queue == null) ? This would make 
the code way more readable.

12. Nit: Strings as a first parameter of new YarnException() could be formatted 
with String.format() instead to make the code more easy to understand.

13. Typo in comment block inside 
MappingRuleValidationContextImpl#validateDynamicQueuePath: accessibe --> 
accessible

14. Typo in comment of 
TestMappingRuleValidationContextImpl#testContextVariables: 
"//hence all quese were considered static" --> queues I guess.

15. In TestMappingRuleValidationContextImpl#testContextVariables: I was not 
sure first if I liked the tale with the comments, but let's say I like them 
rather than not :)
Can you add newlines between the comments for better readability?

16. Nit: TestMappingRuleValidationContextImpl: assertXXX methods can be 
private. 

17. TestMappingRuleValidationContextImpl#assertValidPath: Can you log the 
exception with LOG.error(message, e) ?

Please file a follow-up jira to fix these. Thanks


> Create Actions for CS mapping rules
> ---
>
> Key: YARN-10374
> URL: https://issues.apache.org/jira/browse/YARN-10374
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10374.001.patch, YARN-10374.002.patch, 
> YARN-10374.003.patch
>
>
> As per the design document attached to the umbrella Jira (YARN-10370), we 
> need to create the classes which represent the different actions we can take 
> when a mapping rule applies to an application placement. There are multiple 
> actions to be implemented, like place to queue or reject application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10413) Change fs2cs to generate mapping rules in the new format

2020-09-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188595#comment-17188595
 ] 

Hadoop QA commented on YARN-10413:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 27m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
43s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| 

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-09-01 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188591#comment-17188591
 ] 

Jim Brennan commented on YARN-10393:


{quote}BTW, the normal calculation between requestID and responseID would be 
 responseID == requestID + 1
{noformat}
// code placeholder
if (responseId != lastHeartbeatID + 1) {
   pendingCompletedContainers.clear();
}
{noformat}
{quote}
Good point [~yuanbo]. But this is the reverse of what we want. We want to do 
the clear() when we get the expected responseID, and we want to skip the 
clear() when we don't get the expected responseId.

But your comment made me realize that my suggestion won't work as is. The NM's 
lastHeartBeatID is only updated when we get a response back from the RM. On the 
missed heartbeat, we don't update it. In the case of a duplicate, the RM 
re-sends the last response again, which will still be lastHeartBeatID + 1. I 
think we need to base this on a flag that we set when we get an exception from 
the request attempt. So something like:
{noformat}
logAggregationReportForAppsTempList.clear();
lastHeartbeatID = response.getResponseId();
// Don't clear if the previous heartbeat was missed,
if (!missedHearbeat) {
  pendingCompletedContainers.clear();
}
missedHearbeat = false;
...
} catch (Exception e) {

  // TODO Better error handling. Thread can die with the rest of the
  // NM still running.
  LOG.error("Caught exception in status-updater", e);
  missedHearbeat = true;
} finally {
{noformat}
On the missed hearbeat, we don't clear {{pendingCompletedContainers}} because 
of the exception, and by adding this change, we will prevent clearing it until 
the heartbeat after we get a response back from the RM. I think the downside is 
that if the RM never got the missed heartbeat (if it failed on the send vs the 
response), then the one after the missed on might not be a duplicate, and we 
will potentially re-send completed containers to the RM on the next hearbeat. 
But that should happen only once in this case. I don't think there is any way 
for the NM to tell if the RM's response is a duplicate or not.

What do you think?

> MR job live lock caused by completed state container leak in heartbeat 
> between node manager and RM
> --
>
> Key: YARN-10393
> URL: https://issues.apache.org/jira/browse/YARN-10393
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.6.1, 2.7.2, 2.6.2, 3.0.0, 2.9.2, 3.3.0, 3.2.1, 3.1.3, 
> 3.4.0
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> This was a bug we had seen multiple times on Hadoop 2.6.2. And the following 
> analysis is based on the core dump, logs, and code in 2017 with Hadoop 2.6.2. 
> We hadn't seen it after 2.9 in our env. However, it was because of the RPC 
> retry policy change and other changes. There's still a possibility even with 
> the current code if I didn't miss anything.
> *High-level description:*
>  We had seen a starving mapper issue several times. The MR job stuck in a 
> live lock state and couldn't make any progress. The queue is full so the 
> pending mapper can’t get any resource to continue, and the application master 
> failed to preempt the reducer, thus causing the job to be stuck. The reason 
> why the application master didn’t preempt the reducer was that there was a 
> leaked container in assigned mappers. The node manager failed to report the 
> completed container to the resource manager.
> *Detailed steps:*
>  
>  # Container_1501226097332_249991_01_000199 was assigned to 
> attempt_1501226097332_249991_m_95_0 on 2017-08-08 16:00:00,417.
> {code:java}
> appmaster.log:6464:2017-08-08 16:00:00,417 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_1501226097332_249991_01_000199 to 
> attempt_1501226097332_249991_m_95_0
> {code}
>  # The container finished on 2017-08-08 16:02:53,313.
> {code:java}
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1501226097332_249991_01_000199 transitioned from RUNNING 
> to EXITED_WITH_SUCCESS
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1501226097332_249991_01_000199
> {code}
>  # The NodeStatusUpdater go an exception in the heartbeat on 2017-08-08 
> 16:07:04,238. In fact, the heartbeat request is actually handled by resource 
> 

[jira] [Commented] (YARN-10031) Create a general purpose log request with additional query parameters

2020-09-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188589#comment-17188589
 ] 

Hadoop QA commented on YARN-10031:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
5s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
37s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
55s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
25s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
9s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m  
1s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
13s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 18m 
13s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 47s{color} | {color:orange} root: The patch generated 8 new + 36 unchanged - 
0 fixed = 44 total (was 36) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| 

[jira] [Commented] (YARN-10374) Create Actions for CS mapping rules

2020-09-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188578#comment-17188578
 ] 

Peter Bacsko commented on YARN-10374:
-

Thanks [~shuzirra], patch has been  submitted to trunk.

> Create Actions for CS mapping rules
> ---
>
> Key: YARN-10374
> URL: https://issues.apache.org/jira/browse/YARN-10374
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10374.001.patch, YARN-10374.002.patch, 
> YARN-10374.003.patch
>
>
> As per the design document attached to the umbrella Jira (YARN-10370), we 
> need to create the classes which represent the different actions we can take 
> when a mapping rule applies to an application placement. There are multiple 
> actions to be implemented, like place to queue or reject application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10374) Create Actions for CS mapping rules

2020-09-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188565#comment-17188565
 ] 

Peter Bacsko commented on YARN-10374:
-

LGTM +1. Last warning isn't relevant as the source code will be relocated later.

Committing soon.

> Create Actions for CS mapping rules
> ---
>
> Key: YARN-10374
> URL: https://issues.apache.org/jira/browse/YARN-10374
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10374.001.patch, YARN-10374.002.patch, 
> YARN-10374.003.patch
>
>
> As per the design document attached to the umbrella Jira (YARN-10370), we 
> need to create the classes which represent the different actions we can take 
> when a mapping rule applies to an application placement. There are multiple 
> actions to be implemented, like place to queue or reject application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10374) Create Actions for CS mapping rules

2020-09-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188550#comment-17188550
 ] 

Hadoop QA commented on YARN-10374:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
52s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 37s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| 

[jira] [Updated] (YARN-10413) Change fs2cs to generate mapping rules in the new format

2020-09-01 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10413:

Attachment: YARN-10413-001.patch

> Change fs2cs to generate mapping rules in the new format
> 
>
> Key: YARN-10413
> URL: https://issues.apache.org/jira/browse/YARN-10413
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10413-001.patch
>
>
> Currently, the converter tool {{fs2cs}} can convert placement rules to 
> mapping rules, but the differences are too big.
> It should be modified to generate placement rules to the new engine and 
> output it to a separate JSON file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10031) Create a general purpose log request with additional query parameters

2020-09-01 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188433#comment-17188433
 ] 

Andras Gyori commented on YARN-10031:
-

A new patch has been uploaded, where the new query is implemented for IFile 
format as well. Also added support of comparison between two values as the 
following:
{noformat}
userlogs?file_size=<1500_size=>0{noformat}

> Create a general purpose log request with additional query parameters
> -
>
> Key: YARN-10031
> URL: https://issues.apache.org/jira/browse/YARN-10031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Adam Antal
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10031-WIP.001.patch, YARN-10031.001.patch
>
>
> The current endpoints are robust but not very flexible with regards to 
> filtering options. I suggest to add an endpoint which provides filtering 
> options.
> E.g.:
> In ATS we have multiple endpoints:
> /containers/{containerid}/logs/{filename}
> /containerlogs/{containerid}/{filename}
> We could add @QueryParams parameters to the REST endpoints like this:
> /containers/{containerid}/logs?fileName=stderr=FAILED=nm45



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10031) Create a general purpose log request with additional query parameters

2020-09-01 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10031:

Attachment: YARN-10031.001.patch

> Create a general purpose log request with additional query parameters
> -
>
> Key: YARN-10031
> URL: https://issues.apache.org/jira/browse/YARN-10031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Adam Antal
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10031-WIP.001.patch, YARN-10031.001.patch
>
>
> The current endpoints are robust but not very flexible with regards to 
> filtering options. I suggest to add an endpoint which provides filtering 
> options.
> E.g.:
> In ATS we have multiple endpoints:
> /containers/{containerid}/logs/{filename}
> /containerlogs/{containerid}/{filename}
> We could add @QueryParams parameters to the REST endpoints like this:
> /containers/{containerid}/logs?fileName=stderr=FAILED=nm45



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10386) Create new JSON schema for Placement Rules

2020-09-01 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188421#comment-17188421
 ] 

Adam Antal commented on YARN-10386:
---

Thanks for the addendum patch [~pbacsko]. +1 from me, committed to trunk (let's 
not block other issues with this).

> Create new JSON schema for Placement Rules
> --
>
> Key: YARN-10386
> URL: https://issues.apache.org/jira/browse/YARN-10386
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: MappingRulesDescription_v1.json, YARN-10386-001.patch, 
> YARN-10386-002.patch, YARN-10386-003.patch, YARN-10386-004.patch, 
> YARN-10386-005.patch, YARN-10386-006.patch, YARN-10386-007.patch, 
> YARN-10386-008.patch, YARN-10386-appendum.patch, YARN-10386-appendum2.patch
>
>
> Tasks in this JIRA:
>  # Create new JSON schema
>  # Add Maven plugin which generates Java POJOs based on the schema
>  # Add helper class which essentially does the same as #2 (for dev purposes)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10374) Create Actions for CS mapping rules

2020-09-01 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak updated YARN-10374:
--
Attachment: YARN-10374.003.patch

> Create Actions for CS mapping rules
> ---
>
> Key: YARN-10374
> URL: https://issues.apache.org/jira/browse/YARN-10374
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10374.001.patch, YARN-10374.002.patch, 
> YARN-10374.003.patch
>
>
> As per the design document attached to the umbrella Jira (YARN-10370), we 
> need to create the classes which represent the different actions we can take 
> when a mapping rule applies to an application placement. There are multiple 
> actions to be implemented, like place to queue or reject application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10372) Create MappingRule class to represent each CS mapping rule

2020-09-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188401#comment-17188401
 ] 

Peter Bacsko commented on YARN-10372:
-

Ok, this latch LGTM, pending on YARN-10374.

> Create MappingRule class to represent each CS mapping rule
> --
>
> Key: YARN-10372
> URL: https://issues.apache.org/jira/browse/YARN-10372
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10372.001.patch, YARN-10372.002.patch
>
>
> As per the design document attached to the umbrella Jira (YARN-10370), we 
> need a class which represents a mapping rule, which can be evaluated, and 
> determines if the rule applies to the current application placement and 
> determines the action to be taken.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10374) Create Actions for CS mapping rules

2020-09-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188399#comment-17188399
 ] 

Peter Bacsko commented on YARN-10374:
-

[~shuzirra] thanks for fixing the issues, unfortunately there are still some 
checkstyle problems, could you also fix those?

> Create Actions for CS mapping rules
> ---
>
> Key: YARN-10374
> URL: https://issues.apache.org/jira/browse/YARN-10374
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10374.001.patch, YARN-10374.002.patch
>
>
> As per the design document attached to the umbrella Jira (YARN-10370), we 
> need to create the classes which represent the different actions we can take 
> when a mapping rule applies to an application placement. There are multiple 
> actions to be implemented, like place to queue or reject application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-09-01 Thread Yuanbo Liu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188358#comment-17188358
 ] 

Yuanbo Liu commented on YARN-10393:
---

[~adam.antal] Thanks for your comments. I misunderstood [~Jim_Brennan] 's 
solution.  
+1 for that. 

BTW, the normal calculation between requestID and responseID would be 

responseID == requestID + 1
{code:java}
// code placeholder
if (responseId != lastHeartbeatID + 1) {
   pendingCompletedContainers.clear();
}
{code}
correct me if I'm wrong.

 

[~wzzdreamer] What's your thoughts about the solution?

> MR job live lock caused by completed state container leak in heartbeat 
> between node manager and RM
> --
>
> Key: YARN-10393
> URL: https://issues.apache.org/jira/browse/YARN-10393
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.6.1, 2.7.2, 2.6.2, 3.0.0, 2.9.2, 3.3.0, 3.2.1, 3.1.3, 
> 3.4.0
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> This was a bug we had seen multiple times on Hadoop 2.6.2. And the following 
> analysis is based on the core dump, logs, and code in 2017 with Hadoop 2.6.2. 
> We hadn't seen it after 2.9 in our env. However, it was because of the RPC 
> retry policy change and other changes. There's still a possibility even with 
> the current code if I didn't miss anything.
> *High-level description:*
>  We had seen a starving mapper issue several times. The MR job stuck in a 
> live lock state and couldn't make any progress. The queue is full so the 
> pending mapper can’t get any resource to continue, and the application master 
> failed to preempt the reducer, thus causing the job to be stuck. The reason 
> why the application master didn’t preempt the reducer was that there was a 
> leaked container in assigned mappers. The node manager failed to report the 
> completed container to the resource manager.
> *Detailed steps:*
>  
>  # Container_1501226097332_249991_01_000199 was assigned to 
> attempt_1501226097332_249991_m_95_0 on 2017-08-08 16:00:00,417.
> {code:java}
> appmaster.log:6464:2017-08-08 16:00:00,417 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_1501226097332_249991_01_000199 to 
> attempt_1501226097332_249991_m_95_0
> {code}
>  # The container finished on 2017-08-08 16:02:53,313.
> {code:java}
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1501226097332_249991_01_000199 transitioned from RUNNING 
> to EXITED_WITH_SUCCESS
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1501226097332_249991_01_000199
> {code}
>  # The NodeStatusUpdater go an exception in the heartbeat on 2017-08-08 
> 16:07:04,238. In fact, the heartbeat request is actually handled by resource 
> manager, however, the node manager failed to receive the response. Let’s 
> assume the heartBeatResponseId=$hid in node manager. According to our current 
> configuration, next heartbeat will be 10s later.
> {code:java}
> 2017-08-08 16:07:04,238 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught 
> exception in status-updater
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: ; destination host 
> is: XXX
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy33.nodeHeartbeat(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
> at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy34.nodeHeartbeat(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:597)
> at 

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-09-01 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188306#comment-17188306
 ] 

Adam Antal commented on YARN-10393:
---

Thanks for the discussion above, I think we've seen several approaches there.

In my opinion our best solution so far is this one:
bq. Another option would be to remove pendingCompletedContainers.clear() from 
the end of removeOrTrackCompletedContainersFromContext() - it doesn't really 
belong there anyway, and do it directly in the run()) method. By doing that, 
you can make it conditional on the heartbeatId changing. Something like: [...]

Also that would generally resolve all heartbeat processing problems. The 
concrete bug report was about the completed containers, but this is a more 
general solution for this bug, and it's quite elegant.

Regarding the two ways you mentioned [~yuanbo], I'd prefer option 2:
bq. 2、Resending container id from recentlyStoppedContainers periodically (maybe 
1 mins?), once it get response from RM, it will be removed from 
recentlyStoppedContainers and never get retried again.

I believe option 1. does not guarantee to resolve anything in case of e.g. an 
intermittent network disruption, where the same thing happens: even if we retry 
3 times, we may end up in the same situation. If there's a solution without the 
assumption that the connection issue resolve sooner then the retries succeed, I 
think we should go there: if makes Hadoop more resilient and not less resilient 
(because we depend on the assumption). Obviously everything is configurable 
(make arbitrary large retries), and you cannot achieve 100% resilience, but I 
found solution 2 a logically better solution.

However even option 2 provides a less general solution than the one mentioned 
by [~Jim_Brennan]. I would be the most comfortable with combining the two 
solutions, but that might be too much for this issue. Let's elaborate on this a 
bit. What do others think?

> MR job live lock caused by completed state container leak in heartbeat 
> between node manager and RM
> --
>
> Key: YARN-10393
> URL: https://issues.apache.org/jira/browse/YARN-10393
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.6.1, 2.7.2, 2.6.2, 3.0.0, 2.9.2, 3.3.0, 3.2.1, 3.1.3, 
> 3.4.0
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> This was a bug we had seen multiple times on Hadoop 2.6.2. And the following 
> analysis is based on the core dump, logs, and code in 2017 with Hadoop 2.6.2. 
> We hadn't seen it after 2.9 in our env. However, it was because of the RPC 
> retry policy change and other changes. There's still a possibility even with 
> the current code if I didn't miss anything.
> *High-level description:*
>  We had seen a starving mapper issue several times. The MR job stuck in a 
> live lock state and couldn't make any progress. The queue is full so the 
> pending mapper can’t get any resource to continue, and the application master 
> failed to preempt the reducer, thus causing the job to be stuck. The reason 
> why the application master didn’t preempt the reducer was that there was a 
> leaked container in assigned mappers. The node manager failed to report the 
> completed container to the resource manager.
> *Detailed steps:*
>  
>  # Container_1501226097332_249991_01_000199 was assigned to 
> attempt_1501226097332_249991_m_95_0 on 2017-08-08 16:00:00,417.
> {code:java}
> appmaster.log:6464:2017-08-08 16:00:00,417 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_1501226097332_249991_01_000199 to 
> attempt_1501226097332_249991_m_95_0
> {code}
>  # The container finished on 2017-08-08 16:02:53,313.
> {code:java}
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1501226097332_249991_01_000199 transitioned from RUNNING 
> to EXITED_WITH_SUCCESS
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1501226097332_249991_01_000199
> {code}
>  # The NodeStatusUpdater go an exception in the heartbeat on 2017-08-08 
> 16:07:04,238. In fact, the heartbeat request is actually handled by resource 
> manager, however, the node manager failed to receive the response. Let’s 
> assume the heartBeatResponseId=$hid in node manager. According to our current 
> configuration, next heartbeat will be 10s later.
> {code:java}
> 2017-08-08 16:07:04,238 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: 

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-09-01 Thread Yuanbo Liu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188276#comment-17188276
 ] 

Yuanbo Liu commented on YARN-10393:
---

[~adam.antal] sorry to interrupt, any thoughts about this issue?

> MR job live lock caused by completed state container leak in heartbeat 
> between node manager and RM
> --
>
> Key: YARN-10393
> URL: https://issues.apache.org/jira/browse/YARN-10393
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.6.1, 2.7.2, 2.6.2, 3.0.0, 2.9.2, 3.3.0, 3.2.1, 3.1.3, 
> 3.4.0
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> This was a bug we had seen multiple times on Hadoop 2.6.2. And the following 
> analysis is based on the core dump, logs, and code in 2017 with Hadoop 2.6.2. 
> We hadn't seen it after 2.9 in our env. However, it was because of the RPC 
> retry policy change and other changes. There's still a possibility even with 
> the current code if I didn't miss anything.
> *High-level description:*
>  We had seen a starving mapper issue several times. The MR job stuck in a 
> live lock state and couldn't make any progress. The queue is full so the 
> pending mapper can’t get any resource to continue, and the application master 
> failed to preempt the reducer, thus causing the job to be stuck. The reason 
> why the application master didn’t preempt the reducer was that there was a 
> leaked container in assigned mappers. The node manager failed to report the 
> completed container to the resource manager.
> *Detailed steps:*
>  
>  # Container_1501226097332_249991_01_000199 was assigned to 
> attempt_1501226097332_249991_m_95_0 on 2017-08-08 16:00:00,417.
> {code:java}
> appmaster.log:6464:2017-08-08 16:00:00,417 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_1501226097332_249991_01_000199 to 
> attempt_1501226097332_249991_m_95_0
> {code}
>  # The container finished on 2017-08-08 16:02:53,313.
> {code:java}
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1501226097332_249991_01_000199 transitioned from RUNNING 
> to EXITED_WITH_SUCCESS
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1501226097332_249991_01_000199
> {code}
>  # The NodeStatusUpdater go an exception in the heartbeat on 2017-08-08 
> 16:07:04,238. In fact, the heartbeat request is actually handled by resource 
> manager, however, the node manager failed to receive the response. Let’s 
> assume the heartBeatResponseId=$hid in node manager. According to our current 
> configuration, next heartbeat will be 10s later.
> {code:java}
> 2017-08-08 16:07:04,238 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught 
> exception in status-updater
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: ; destination host 
> is: XXX
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy33.nodeHeartbeat(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
> at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy34.nodeHeartbeat(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:597)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>   

[jira] [Commented] (YARN-10393) MR job live lock caused by completed state container leak in heartbeat between node manager and RM

2020-09-01 Thread Yuanbo Liu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17188270#comment-17188270
 ] 

Yuanbo Liu commented on YARN-10393:
---

[~Jim_Brennan] Thanks for the reply.

Basically I'm opening to this two options:

1、Change the code of sending heartbeat, send it again if response not received 
(like [~wzzdreamer] did in the pr), but I'd prefer to introduce some max-retry 
code in case of potential infinit loop.

2、Resending container id from recentlyStoppedContainers periodically (maybe 1 
mins?), once it get response from RM, it will be removed from 
recentlyStoppedContainers and never get retried again.

> MR job live lock caused by completed state container leak in heartbeat 
> between node manager and RM
> --
>
> Key: YARN-10393
> URL: https://issues.apache.org/jira/browse/YARN-10393
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.6.1, 2.7.2, 2.6.2, 3.0.0, 2.9.2, 3.3.0, 3.2.1, 3.1.3, 
> 3.4.0
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> This was a bug we had seen multiple times on Hadoop 2.6.2. And the following 
> analysis is based on the core dump, logs, and code in 2017 with Hadoop 2.6.2. 
> We hadn't seen it after 2.9 in our env. However, it was because of the RPC 
> retry policy change and other changes. There's still a possibility even with 
> the current code if I didn't miss anything.
> *High-level description:*
>  We had seen a starving mapper issue several times. The MR job stuck in a 
> live lock state and couldn't make any progress. The queue is full so the 
> pending mapper can’t get any resource to continue, and the application master 
> failed to preempt the reducer, thus causing the job to be stuck. The reason 
> why the application master didn’t preempt the reducer was that there was a 
> leaked container in assigned mappers. The node manager failed to report the 
> completed container to the resource manager.
> *Detailed steps:*
>  
>  # Container_1501226097332_249991_01_000199 was assigned to 
> attempt_1501226097332_249991_m_95_0 on 2017-08-08 16:00:00,417.
> {code:java}
> appmaster.log:6464:2017-08-08 16:00:00,417 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_1501226097332_249991_01_000199 to 
> attempt_1501226097332_249991_m_95_0
> {code}
>  # The container finished on 2017-08-08 16:02:53,313.
> {code:java}
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1501226097332_249991_01_000199 transitioned from RUNNING 
> to EXITED_WITH_SUCCESS
> yarn-mapred-nodemanager-.log.1:2017-08-08 16:02:53,313 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1501226097332_249991_01_000199
> {code}
>  # The NodeStatusUpdater go an exception in the heartbeat on 2017-08-08 
> 16:07:04,238. In fact, the heartbeat request is actually handled by resource 
> manager, however, the node manager failed to receive the response. Let’s 
> assume the heartBeatResponseId=$hid in node manager. According to our current 
> configuration, next heartbeat will be 10s later.
> {code:java}
> 2017-08-08 16:07:04,238 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught 
> exception in status-updater
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: ; destination host 
> is: XXX
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy33.nodeHeartbeat(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.nodeHeartbeat(ResourceTrackerPBClientImpl.java:80)
> at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy34.nodeHeartbeat(Unknown Source)
> at 
> 

[jira] [Updated] (YARN-10418) Add css attribute "word-break" to Diagnostic messages in webUI of RM

2020-09-01 Thread Dapeng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated YARN-10418:
--
Attachment: after.png

> Add css attribute "word-break" to Diagnostic messages in webUI of RM
> 
>
> Key: YARN-10418
> URL: https://issues.apache.org/jira/browse/YARN-10418
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
>Priority: Trivial
> Attachments: YARN-10418.001.patch, after.png, before.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10418) Add css attribute "word-break" to Diagnostic messages in webUI of RM

2020-09-01 Thread Dapeng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated YARN-10418:
--
Attachment: before.png

> Add css attribute "word-break" to Diagnostic messages in webUI of RM
> 
>
> Key: YARN-10418
> URL: https://issues.apache.org/jira/browse/YARN-10418
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
>Priority: Trivial
> Attachments: YARN-10418.001.patch, before.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10418) Add css attribute "word-break" to Diagnostic messages in webUI of RM

2020-09-01 Thread Dapeng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated YARN-10418:
--
Attachment: YARN-10418.001.patch

> Add css attribute "word-break" to Diagnostic messages in webUI of RM
> 
>
> Key: YARN-10418
> URL: https://issues.apache.org/jira/browse/YARN-10418
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
>Priority: Trivial
> Attachments: YARN-10418.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10418) Add css attribute "word-break" to Diagnostic messages in webUI of RM

2020-09-01 Thread Dapeng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated YARN-10418:
--
Summary: Add css attribute "word-break" to Diagnostic messages in webUI of 
RM  (was: Diagnostic messages in RM webUI is better to add "word-break" css 
attribute)

> Add css attribute "word-break" to Diagnostic messages in webUI of RM
> 
>
> Key: YARN-10418
> URL: https://issues.apache.org/jira/browse/YARN-10418
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10418) Diagnostic messages in RM webUI is better to add "word-break" css attribute

2020-09-01 Thread Dapeng Sun (Jira)
Dapeng Sun created YARN-10418:
-

 Summary: Diagnostic messages in RM webUI is better to add 
"word-break" css attribute
 Key: YARN-10418
 URL: https://issues.apache.org/jira/browse/YARN-10418
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Dapeng Sun
Assignee: Dapeng Sun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org