[jira] [Commented] (YARN-10202) Fix documentation about NodeAttributes.

2020-03-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072374#comment-17072374
 ] 

Hadoop QA commented on YARN-10202:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
53s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
37m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 |
| JIRA Issue | YARN-10202 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12997081/YARN-10202.001.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux 361117d85c3b 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c734d24 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 309 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25793/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Fix documentation about NodeAttributes.
> ---
>
> Key: YARN-10202
> URL: https://issues.apache.org/jira/browse/YARN-10202
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.2.1
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Minor
> Attachments: YARN-10202.001.patch
>
>
> {noformat:title=NodeAttributes.md}
> The above SchedulingRequest requests for 1 container on nodes that must 
> satisfy following constraints:
> 1. Node attribute *`rm.yarn.io/python`* doesn't exist on the node or it exist 
> but its value is not equal to 3
> 2. Node attribute *`rm.yarn.io/java`* must exist on the node and its value is 
> equal to 1.8
> {noformat}
> should be
> {noformat}
> The above SchedulingRequest requests for 1 container on nodes that must 
> satisfy following constraints:
> 1. Node attribute *`rm.yarn.io/python`* doesn't exist on the node or it exist 
> but its value is not equal to 3
> 2. Node attribute *`rm.yarn.io/java`* must exist on the node and its value is 
> equal to 1.8
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10202) Fix documentation about NodeAttributes.

2020-03-31 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10202:
-
Hadoop Flags: Reviewed
  Issue Type: Bug  (was: Improvement)

> Fix documentation about NodeAttributes.
> ---
>
> Key: YARN-10202
> URL: https://issues.apache.org/jira/browse/YARN-10202
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.2.1
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Minor
> Attachments: YARN-10202.001.patch
>
>
> {noformat:title=NodeAttributes.md}
> The above SchedulingRequest requests for 1 container on nodes that must 
> satisfy following constraints:
> 1. Node attribute *`rm.yarn.io/python`* doesn't exist on the node or it exist 
> but its value is not equal to 3
> 2. Node attribute *`rm.yarn.io/java`* must exist on the node and its value is 
> equal to 1.8
> {noformat}
> should be
> {noformat}
> The above SchedulingRequest requests for 1 container on nodes that must 
> satisfy following constraints:
> 1. Node attribute *`rm.yarn.io/python`* doesn't exist on the node or it exist 
> but its value is not equal to 3
> 2. Node attribute *`rm.yarn.io/java`* must exist on the node and its value is 
> equal to 1.8
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10202) Fix documentation about NodeAttributes.

2020-03-31 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10202:
-
Component/s: documentation

> Fix documentation about NodeAttributes.
> ---
>
> Key: YARN-10202
> URL: https://issues.apache.org/jira/browse/YARN-10202
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.2.1
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Minor
> Attachments: YARN-10202.001.patch
>
>
> {noformat:title=NodeAttributes.md}
> The above SchedulingRequest requests for 1 container on nodes that must 
> satisfy following constraints:
> 1. Node attribute *`rm.yarn.io/python`* doesn't exist on the node or it exist 
> but its value is not equal to 3
> 2. Node attribute *`rm.yarn.io/java`* must exist on the node and its value is 
> equal to 1.8
> {noformat}
> should be
> {noformat}
> The above SchedulingRequest requests for 1 container on nodes that must 
> satisfy following constraints:
> 1. Node attribute *`rm.yarn.io/python`* doesn't exist on the node or it exist 
> but its value is not equal to 3
> 2. Node attribute *`rm.yarn.io/java`* must exist on the node and its value is 
> equal to 1.8
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10202) Fix documentation about NodeAttributes.

2020-03-31 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072355#comment-17072355
 ] 

Akira Ajisaka commented on YARN-10202:
--

+1

> Fix documentation about NodeAttributes.
> ---
>
> Key: YARN-10202
> URL: https://issues.apache.org/jira/browse/YARN-10202
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.1
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Minor
> Attachments: YARN-10202.001.patch
>
>
> {noformat:title=NodeAttributes.md}
> The above SchedulingRequest requests for 1 container on nodes that must 
> satisfy following constraints:
> 1. Node attribute *`rm.yarn.io/python`* doesn't exist on the node or it exist 
> but its value is not equal to 3
> 2. Node attribute *`rm.yarn.io/java`* must exist on the node and its value is 
> equal to 1.8
> {noformat}
> should be
> {noformat}
> The above SchedulingRequest requests for 1 container on nodes that must 
> satisfy following constraints:
> 1. Node attribute *`rm.yarn.io/python`* doesn't exist on the node or it exist 
> but its value is not equal to 3
> 2. Node attribute *`rm.yarn.io/java`* must exist on the node and its value is 
> equal to 1.8
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10217) Expired SampleStat should ignore when generating SlowPeersReport

2020-03-31 Thread Haibin Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibin Huang updated YARN-10217:

Attachment: YARN-10217-002.patch

> Expired SampleStat should ignore when generating SlowPeersReport
> 
>
> Key: YARN-10217
> URL: https://issues.apache.org/jira/browse/YARN-10217
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Haibin Huang
>Priority: Major
> Attachments: YARN-10217-001.patch, YARN-10217-002.patch
>
>
> create this issue to verify 
> [HADOOP-16947|https://issues.apache.org/jira/browse/HADOOP-16947] can work 
> well on yarn too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10189) Code cleanup in LeveldbRMStateStore

2020-03-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072164#comment-17072164
 ] 

Hadoop QA commented on YARN-10189:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
45s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}103m  
9s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}162m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 |
| JIRA Issue | YARN-10189 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12998367/YARN-10189.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1f1a920c89b4 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c734d24 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25791/testReport/ |
| Max. process+thread count | 833 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25791/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Code cleanup in LeveldbRMStateStore
> 

[jira] [Commented] (YARN-10217) Expired SampleStat should ignore when generating SlowPeersReport

2020-03-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072138#comment-17072138
 ] 

Hadoop QA commented on YARN-10217:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 26m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
40s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
47s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 22s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 0s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}238m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestDecommission |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 |
| JIRA Issue | YARN-10217 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12998357/YARN-10217-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9c6f6c37975c 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c734d24 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Updated] (YARN-10189) Code cleanup in LeveldbRMStateStore

2020-03-31 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10189:
-
Attachment: YARN-10189.001.patch

> Code cleanup in LeveldbRMStateStore
> ---
>
> Key: YARN-10189
> URL: https://issues.apache.org/jira/browse/YARN-10189
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Minor
> Attachments: YARN-10189.001.patch, YARN-10189.POC001.patch, 
> YARN-10189.POC002.patch
>
>
> Some things can be improved:
>  * throws Exception declaration can be removed from 
> LeveldbRMStateStore.initInternal method 
>  * key variable is redundant in LeveldbRMStateStore.dbStoreVersion
>  * try can use automatic Resource management in 
> LeveldbRMStateStore.loadReservationState/loadRMDTSecretManagerKeys/loadRMDTSecretManagerTokens/loadRMApps/...
>  etc
>  * there were some methods which were copied to LeveldbConfigurationStore 
> (ie: openDatabase, storeVersion, loadVersion, CompactionTimerClass nested 
> class), a helper class could be created to reduce the duplicated code
>  * Any other cleanup



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10217) Expired SampleStat should ignore when generating SlowPeersReport

2020-03-31 Thread Haibin Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibin Huang updated YARN-10217:

Attachment: YARN-10217-001.patch

> Expired SampleStat should ignore when generating SlowPeersReport
> 
>
> Key: YARN-10217
> URL: https://issues.apache.org/jira/browse/YARN-10217
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Haibin Huang
>Priority: Major
> Attachments: YARN-10217-001.patch
>
>
> create this issue to verify 
> [HADOOP-16947|https://issues.apache.org/jira/browse/HADOOP-16947] can work 
> well on yarn too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10217) Expired SampleStat should ignore when generating SlowPeersReport

2020-03-31 Thread Haibin Huang (Jira)
Haibin Huang created YARN-10217:
---

 Summary: Expired SampleStat should ignore when generating 
SlowPeersReport
 Key: YARN-10217
 URL: https://issues.apache.org/jira/browse/YARN-10217
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Haibin Huang


create this issue to verify 
[HADOOP-16947|https://issues.apache.org/jira/browse/HADOOP-16947] can work well 
on yarn too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10208) Add metric in CapacityScheduler for evaluating the time difference between node heartbeats

2020-03-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071784#comment-17071784
 ] 

Hadoop QA commented on YARN-10208:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 32s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 98 unchanged - 0 fixed = 99 total (was 98) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 12s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}154m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoCreatedQueuePreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 |
| JIRA Issue | YARN-10208 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12998216/YARN-10208.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a7da979da391 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 80b877a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/25789/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Commented] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-03-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071666#comment-17071666
 ] 

Hadoop QA commented on YARN-10207:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
7s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 |
| JIRA Issue | YARN-10207 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12998318/YARN-10207.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4c7310692293 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 80b877a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25788/testReport/ |
| Max. process+thread count | 313 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25788/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> 

[jira] [Commented] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-03-31 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071621#comment-17071621
 ] 

Siddharth Ahuja commented on YARN-10207:


Fixing up checkstyle warnings as per 
https://builds.apache.org/job/PreCommit-YARN-Build/25787/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt.

> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> -
>
> Key: YARN-10207
> URL: https://issues.apache.org/jira/browse/YARN-10207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-10207.001.patch, YARN-10207.002.patch
>
>
> File descriptor leaks are observed coming from the JobHistoryServer process 
> while it tries to render a "corrupted" aggregated log on the JHS Web UI.
> Issue reproduced using the following steps:
> # Ran a sample Hadoop MR Pi job, it had the id - 
> application_1582676649923_0026.
> # Copied an aggregated log file from HDFS to local FS:
> {code}
> hdfs dfs -get 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Updated the TFile metadata at the bottom of this file with some junk to 
> corrupt the file :
> *Before:*
> {code}
>   
> ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP
> {code}
> *After:*
> {code}
>   
> ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah
> {code}
> Notice "blah" (junk) added at the very end.
> # Remove the existing aggregated log file that will need to be replaced by 
> our modified copy from step 3 (as otherwise HDFS will prevent it from placing 
> the file with the same name as it already exists):
> {code}
> hdfs dfs -rm -r -f 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Upload the corrupted aggregated file back to HDFS:
> {code}
> hdfs dfs -put _8041 
> /tmp/logs/systest/logs/application_1582676649923_0026
> {code}
> # Visit HistoryServer Web UI
> # Click on job_1582676649923_0026
> # Click on "logs" link against the AM (assuming the AM ran on nm_hostname)
> # Review the JHS logs, following exception will be seen:
> {code}
>   2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error 
> getting logs for job_1582676649923_0026
>   java.io.IOException: Not a valid BCFile.
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628)
>   at 
> org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341)
>   at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> 

[jira] [Updated] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-03-31 Thread Siddharth Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Ahuja updated YARN-10207:
---
Attachment: YARN-10207.002.patch

> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> -
>
> Key: YARN-10207
> URL: https://issues.apache.org/jira/browse/YARN-10207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-10207.001.patch, YARN-10207.002.patch
>
>
> File descriptor leaks are observed coming from the JobHistoryServer process 
> while it tries to render a "corrupted" aggregated log on the JHS Web UI.
> Issue reproduced using the following steps:
> # Ran a sample Hadoop MR Pi job, it had the id - 
> application_1582676649923_0026.
> # Copied an aggregated log file from HDFS to local FS:
> {code}
> hdfs dfs -get 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Updated the TFile metadata at the bottom of this file with some junk to 
> corrupt the file :
> *Before:*
> {code}
>   
> ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP
> {code}
> *After:*
> {code}
>   
> ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah
> {code}
> Notice "blah" (junk) added at the very end.
> # Remove the existing aggregated log file that will need to be replaced by 
> our modified copy from step 3 (as otherwise HDFS will prevent it from placing 
> the file with the same name as it already exists):
> {code}
> hdfs dfs -rm -r -f 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Upload the corrupted aggregated file back to HDFS:
> {code}
> hdfs dfs -put _8041 
> /tmp/logs/systest/logs/application_1582676649923_0026
> {code}
> # Visit HistoryServer Web UI
> # Click on job_1582676649923_0026
> # Click on "logs" link against the AM (assuming the AM ran on nm_hostname)
> # Review the JHS logs, following exception will be seen:
> {code}
>   2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error 
> getting logs for job_1582676649923_0026
>   java.io.IOException: Not a valid BCFile.
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628)
>   at 
> org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341)
>   at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> 

[jira] [Updated] (YARN-8340) Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more resources enabled.

2020-03-31 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-8340:

Target Version/s: 3.4.0  (was: 3.3.0)

> Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more 
> resources enabled.
> -
>
> Key: YARN-8340
> URL: https://issues.apache.org/jira/browse/YARN-8340
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> Refer to comment from [~eepayne] and discussion below that: 
> https://issues.apache.org/jira/browse/YARN-8292?focusedCommentId=16482689=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16482689
>  for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-03-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071598#comment-17071598
 ] 

Hadoop QA commented on YARN-10207:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 22s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 4 new + 
18 unchanged - 0 fixed = 22 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
41s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 |
| JIRA Issue | YARN-10207 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12998299/YARN-10207.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f7da3d10215f 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 80b877a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/25787/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25787/testReport/ |
| Max. process+thread count | 309 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 

[jira] [Commented] (YARN-10215) Endpoint for obtaining direct URL for the logs

2020-03-31 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071587#comment-17071587
 ] 

Adam Antal commented on YARN-10215:
---

The Java part:
- The endpoints in {{LogServlet}} are used already, so we cannot change the 
default behaviour (and their return value), which would break compatibility. So 
the return value of the functions {{LogServlet#getContainerLogsInfo}} and 
{{LogServlet#getLogFile}} should be the same as before (redirect). Keep in mind 
that users can use these UIs without CORS protection, in which scenario the 
servlet is functioning properly with redirect. 
- So what I'd suggest is to add a {{QueryParam}} that would change the return 
value of the request (307 (temporary redirect) to 206 (partial content) if it 
is specified, otherwise just redirect). I think the switch between response 
types can be implemented in the new {{LogServlet#createLocationResponse}} 
function.

The JS part:
- The main logic in {{controllers/yarn-app/logs.js}} seems good to me. Good job!
- If you've tested this patch in a real cluster then I'm assured the conditions 
work fine. Due to the miscellaneous behaviour of JS comparisons, I'd also add a 
{{headers['location'] !== "null"}} part to the condition of the 
{{handleResponse}} functions just to be sure.
- {{createEmptyContainerLogInfo}} can be moved to an util class, and you can 
reference it from there.

> Endpoint for obtaining direct URL for the logs
> --
>
> Key: YARN-10215
> URL: https://issues.apache.org/jira/browse/YARN-10215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10025.001.patch
>
>
> If CORS protected UIs are set up, there is an issue when the browser tries to 
> access the logs of a running container in the RM web UIv2.
> Assuming ATS is not up, the browser follows the following call chain:
> - Tries to access ATS, it fails, falls back to JHS
> - From RM the browser received basic app info, we know that the application 
> is running
> - From the JHS we got the list of containers and their log files.
> - When we try to access a specific log file, the JHS redirects the request to 
> the NM's UI (on which node the container is running). This redirect is 
> performed by the browser automatically. In this setup the host is considered 
> as a protected information, thus the browser omits the "Origin" field from 
> the request when this redirect is done. The browser then denies access to the 
> NodeManager's web UI due to the CORS header set up for NM, but the Origin is 
> null in the redirect request. 
> - Finally, "Logs are unavailable" message is shown in the RM web UIv2 due to 
> the CORS violation.
> We should fix this. As an approach we can expose another endpoints which only 
> returns the URL of the NodeManager what we should call directly from the UIv2 
> in order to receive the log. This adds a bit of a complexity, but will enable 
> users to keep the CORS protected setup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-03-31 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071542#comment-17071542
 ] 

Siddharth Ahuja edited comment on YARN-10207 at 3/31/20, 7:50 AM:
--

Hi [~adam.antal], thanks for your comments.

The leak happens when AggregatedLogFormat.LogReader is getting instantiated, 
specifically, when TFile.Reader creation within the 
AggregatedLogFormat.LogReader's constructor fails due to a corrupted file 
passed in (see above stacktrace).

The fact that FSDataInputStream is not closed out causes the leak.

The caller - TFileAggregatedLogsBlock.render(…) does try to cleanup the reader 
in the finally clause (see 
https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L153),
 however, it assumes that the reader would have been created successfully. 
However, in our case, the reader never manages to get created because it fails 
during construction phase itself due to a corrupted log.

The fix, therefore, is to catch any IO Exceptions within 
AggregatedLogFormat.LogReader itself inside the constructor, perform a close of 
all the relevant entities including FSDataInputStream and throw the exception 
back to the caller (TFileAggregatedLogsBlock.render) so that it is able to 
catch it and log it 
(https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L150).

This ensures that we don't leak connections etc. wherever the reader fails to 
instantiate (=new AggregatedLogFormat.LogReader).

Based on your feedback, I performed functional testing with IndexedFormat 
(IFile) by setting the following properties inside yarn-site.xml:
{code}
    
        yarn.log-aggregation.file-formats
        IndexedFormat
    
    
        yarn.log-aggregation.file-controller.IndexedFormat.class
        
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController
    
    
        yarn.log-aggregation.IndexedFormat.remote-app-log-dir
        /tmp/ifilelogs
    
    
        
yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix
        ifilelogs
    
{code}

Like the earlier scenario, I corrupted the Ifile (aggregated log in HDFS) and 
tried to render it in JHS Web UI, however, no leaks were found for this case. 

This is the call flow:

IndexedFileAggregatedLogsBlock.render() -> 
LogAggregationIndexedFileController.loadIndexedLogsMeta(…)

IOException is encountered inside this try block, however, notice the finally 
clause here -> 
https://github.com/apache/hadoop/blob/4af2556b48e01150851c7f273a254a16324ba843/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/LogAggregationIndexedFileController.java#L900.
 This helps cleaning up the socket connection by closing out FSDataInputStream. 
 

You will notice that this is a different call stack to the TFile case as we 
don't have a call to AggregatedLogFormat.LogReader i.e. it is coded differently.
Regardless, thanks to that finally clause, it does end up cleaning the 
connection and there are no CLOSE_WAIT leaks in case of a corrupted log file 
being encountered. (Bad thing here is that only a WARN log is presented to the 
user in the JHS logs in case of rendering failing for Tfile logs and there is 
no stacktrace logged coming from the exception here - 
https://github.com/apache/hadoop/blob/c24af4b0d6fc32938b076161b5a8c86d38e3e0a1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/IndexedFileAggregatedLogsBlock.java#L136
 as the exception is just swallowed up inside the catch{} clause. This may 
warrant a separate JIRA.)

As part of this fix, I looked for any occurrences of "new TFile.Reader" that 
may cause connection leaks somewhere else. I found two :
# TFileDumper, see 
https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/file/tfile/TFileDumper.java#L103,
 and,
# FileSystemApplicationHistoryStore, see 
https://github.com/apache/hadoop/blob/7dac7e1d13eaf0eac04fe805c7502dcecd597979/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/FileSystemApplicationHistoryStore.java#L691

1 is not an issue because FSDataInputStream is getting closed inside finally{} 
clause here: 

[jira] [Comment Edited] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-03-31 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071542#comment-17071542
 ] 

Siddharth Ahuja edited comment on YARN-10207 at 3/31/20, 7:37 AM:
--

Hi [~adam.antal], thanks for your comments.

The leak happens when AggregatedLogFormat.LogReader is getting instantiated, 
specifically, when TFile.Reader creation within the 
AggregatedLogFormat.LogReader's constructor fails due to a corrupted file 
passed in (see above stacktrace).

The fact that FSDataInputStream is not closed out causes the leak.

The caller - TFileAggregatedLogsBlock.render(…) does try to cleanup the reader 
in the finally clause (see 
https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L153),
 however, it assumes that the reader would have been created successfully. 
However, in our case, the reader never manages to get created because it fails 
during construction phase itself due to a corrupted log.

The fix, therefore, is to catch any IO Exceptions within 
AggregatedLogFormat.LogReader itself inside the constructor, perform a close of 
all the relevant entities including FSDataInputStream and throw the exception 
back to the caller (TFileAggregatedLogsBlock.render) so that it is able to 
catch it and log it 
(https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L150).

This ensures that we don't leak connections etc. wherever the reader fails to 
instantiate (=new AggregatedLogFormat.LogReader).

Based on your feedback, I performed functional testing with IndexedFormat 
(IFile) by setting the following properties inside yarn-site.xml:
{code}
    
        yarn.log-aggregation.file-formats
        IndexedFormat
    
    
        yarn.log-aggregation.file-controller.IndexedFormat.class
        
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController
    
    
        yarn.log-aggregation.IndexedFormat.remote-app-log-dir
        /tmp/ifilelogs
    
    
        
yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix
        ifilelogs
    
{code}

Like the earlier scenario, I corrupted the Ifile (aggregated log in HDFS) and 
tried to render it in JHS Web UI, however, no leaks were found for this case. 

This is the call flow:

IndexedFileAggregatedLogsBlock.render() -> 
LogAggregationIndexedFileController.loadIndexedLogsMeta(…)

IOException is encountered inside this try block, however, notice the finally 
clause here -> 
https://github.com/apache/hadoop/blob/4af2556b48e01150851c7f273a254a16324ba843/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/LogAggregationIndexedFileController.java#L900.
 This helps cleaning up the socket connection by closing out FSDataInputStream. 
 

You will notice that this is a different call stack to the TFile case as we 
don't have a call to AggregatedLogFormat.LogReader i.e. it is coded differently.
Regardless, thanks to that finally clause, it does end up cleaning the 
connection and there are no CLOSE_WAIT leaks in case of a corrupted log file 
being encountered. (Bad thing here is that only a WARN log is presented to the 
user in the JHS logs in case of rendering failing for Tfile logs and there is 
no stacktrace logged coming from the exception here - 
https://github.com/apache/hadoop/blob/c24af4b0d6fc32938b076161b5a8c86d38e3e0a1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/IndexedFileAggregatedLogsBlock.java#L136
 as the exception is just swallowed up inside the catch{} clause. This may 
warrant a separate JIRA.)

As part of this fix, I looked for any occurrences of "new TFile.Reader" that 
may cause connection leaks somewhere else. I found two :
# TFileDumper, see 
https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/file/tfile/TFileDumper.java#L103,
 and,
# FileSystemApplicationHistoryStore, see 
https://github.com/apache/hadoop/blob/7dac7e1d13eaf0eac04fe805c7502dcecd597979/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/FileSystemApplicationHistoryStore.java#L691

1 is not an issue because FSDataInputStream is getting closed inside finally{} 
clause here: 

[jira] [Comment Edited] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-03-31 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071542#comment-17071542
 ] 

Siddharth Ahuja edited comment on YARN-10207 at 3/31/20, 7:37 AM:
--

Hi [~adam.antal], thanks for your comments.

The leak happens when AggregatedLogFormat.LogReader is getting instantiated, 
specifically, when TFile.Reader creation within the 
AggregatedLogFormat.LogReader's constructor fails due to a corrupted file 
passed in (see above stacktrace).

The fact that FSDataInputStream is not closed out causes the leak.

The caller - TFileAggregatedLogsBlock.render(…) does try to cleanup the reader 
in the finally clause (see 
https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L153),
 however, it assumes that the reader would have been created successfully. 
However, in our case, the reader never manages to get created because it fails 
during construction phase itself due to a corrupted log.

The fix, therefore, is to catch any IO Exceptions within 
AggregatedLogFormat.LogReader itself inside the constructor, perform a close of 
all the relevant entities including FSDataInputStream and throw the exception 
back to the caller (TFileAggregatedLogsBlock.render) so that it is able to 
catch it and log it 
(https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L150).

This ensures that we don't leak connections etc. wherever the reader fails to 
instantiate (=new AggregatedLogFormat.LogReader).

Based on your feedback, I performed functional testing with IndexedFormat 
(IFile) by setting the following properties inside yarn-site.xml:
{code}
    
        yarn.log-aggregation.file-formats
        IndexedFormat
    
    
        yarn.log-aggregation.file-controller.IndexedFormat.class
        
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController
    
    
        yarn.log-aggregation.IndexedFormat.remote-app-log-dir
        /tmp/ifilelogs
    
    
        
yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix
        ifilelogs
    
{code}

Like the earlier scenario, I corrupted the Ifile (aggregated log in HDFS) and 
tried to render it in JHS Web UI, however, no leaks were found for this case. 

This is the call flow:

IndexedFileAggregatedLogsBlock.render() -> 
LogAggregationIndexedFileController.loadIndexedLogsMeta(…)

IOException is encountered inside this try block, however, notice the finally 
clause here -> 
https://github.com/apache/hadoop/blob/4af2556b48e01150851c7f273a254a16324ba843/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/LogAggregationIndexedFileController.java#L900.
 This helps cleaning up the socket connection by closing out FSDataInputStream. 
 

You will notice that this is a different call stack to the TFile case as we 
don't have a call to AggregatedLogFormat.LogReader i.e. it is coded differently.
Regardless, thanks to that finally clause, it does end up cleaning the 
connection and there are no CLOSE_WAIT leaks in case of a corrupted log file 
being encountered. (Bad thing here is that only a WARN log is presented to the 
user in the JHS logs in case of rendering failing for Tfile logs and there is 
no stacktrace logged coming from the exception here - 
https://github.com/apache/hadoop/blob/c24af4b0d6fc32938b076161b5a8c86d38e3e0a1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/IndexedFileAggregatedLogsBlock.java#L136
 as the exception is just swallowed up inside the catch{} clause. This may 
warrant a separate JIRA.)

As part of this fix, I looked for any occurrences of "new TFile.Reader" that 
may cause connection leaks somewhere else. I found two :
# TFileDumper, see 
https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/file/tfile/TFileDumper.java#L103,
 and,
# FileSystemApplicationHistoryStore, see 
https://github.com/apache/hadoop/blob/7dac7e1d13eaf0eac04fe805c7502dcecd597979/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/FileSystemApplicationHistoryStore.java#L691

1 is not an issue because FSDataInputStream is getting closed inside finally{} 
clause here: 

[jira] [Comment Edited] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-03-31 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071542#comment-17071542
 ] 

Siddharth Ahuja edited comment on YARN-10207 at 3/31/20, 7:36 AM:
--

Hi [~adam.antal], thanks for your comments.

The leak happens when AggregatedLogFormat.LogReader is getting instantiated, 
specifically, when TFile.Reader creation within the 
AggregatedLogFormat.LogReader's constructor fails due to a corrupted file 
passed in (see above stacktrace).

The fact that FSDataInputStream is not closed out causes the leak.

The caller - TFileAggregatedLogsBlock.render(…) does try to cleanup the reader 
in the finally clause (see 
https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L153),
 however, it assumes that the reader would have been created successfully. 
However, in our case, the reader never manages to get created because it fails 
during construction phase itself due to a corrupted log.

The fix, therefore, is to catch any IO Exceptions within 
AggregatedLogFormat.LogReader itself inside the constructor, perform a close of 
all the relevant entities including FSDataInputStream and throw the exception 
back to the caller (TFileAggregatedLogsBlock.render) so that it is able to 
catch it and log it 
(https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L150).

This ensures that we don't leak connections etc. wherever the reader fails to 
instantiate (=new AggregatedLogFormat.LogReader).

Based on your feedback, I performed functional testing with IndexedFormat 
(IFile) by setting the following properties inside yarn-site.xml:
{code}
    
        yarn.log-aggregation.file-formats
        IndexedFormat
    
    
        yarn.log-aggregation.file-controller.IndexedFormat.class
        
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController
    
    
        yarn.log-aggregation.IndexedFormat.remote-app-log-dir
        /tmp/ifilelogs
    
    
        
yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix
        ifilelogs
    
{code}

Like the earlier scenario, I corrupted the Ifile (aggregated log in HDFS) and 
tried to render it in JHS Web UI, however, no leaks were found for this case. 

The call happens in this fashion:

IndexedFileAggregatedLogsBlock.render() -> 
LogAggregationIndexedFileController.loadIndexedLogsMeta(…)

IOException is encountered inside this try block, however, notice the finally 
clause here -> 
https://github.com/apache/hadoop/blob/4af2556b48e01150851c7f273a254a16324ba843/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/LogAggregationIndexedFileController.java#L900.
 This helps cleaning up the socket connection by closing out FSDataInputStream. 
 

You will notice that this is a different call stack to the TFile case as we 
don't have a call to AggregatedLogFormat.LogReader i.e. it is coded differently.
Regardless, thanks to that finally clause, it does end up cleaning the 
connection and there are no CLOSE_WAIT leaks in case of a corrupted log file 
being encountered. (Bad thing here is that only a WARN log is presented to the 
user in the JHS logs in case of rendering failing for Tfile logs and there is 
no stacktrace logged coming from the exception here - 
https://github.com/apache/hadoop/blob/c24af4b0d6fc32938b076161b5a8c86d38e3e0a1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/IndexedFileAggregatedLogsBlock.java#L136
 as the exception is just swallowed up inside the catch{} clause. This may 
warrant a separate JIRA.)

As part of this fix, I looked for any occurrences of "new TFile.Reader" that 
may cause connection leaks somewhere else. I found two :
# TFileDumper, see 
https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/file/tfile/TFileDumper.java#L103,
 and,
# FileSystemApplicationHistoryStore, see 
https://github.com/apache/hadoop/blob/7dac7e1d13eaf0eac04fe805c7502dcecd597979/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/FileSystemApplicationHistoryStore.java#L691

1 is not an issue because FSDataInputStream is getting closed inside finally{} 
clause here: 

[jira] [Comment Edited] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-03-31 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071542#comment-17071542
 ] 

Siddharth Ahuja edited comment on YARN-10207 at 3/31/20, 7:35 AM:
--

Hi [~adam.antal], thanks for your comments.

The leak happens when AggregatedLogFormat.LogReader is getting instantiated, 
specifically, when TFile.Reader creation within the 
AggregatedLogFormat.LogReader's constructor fails due to a corrupted file 
passed in (see above stacktrace).

The fact that FSDataInputStream is not closed out causes the leak.

The caller - TFileAggregatedLogsBlock.render(…) does try to cleanup the reader 
in the finally clause (see 
https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L153),
 however, it assumes that the reader would have been created successfully. 
However, in our case, the reader never manages to get created because it fails 
during construction phase itself due to a corrupted log.

The fix, therefore, is to catch any IO Exceptions within 
AggregatedLogFormat.LogReader itself inside the constructor, perform a close of 
all the relevant entities including FSDataInputStream if we do indeed hit any 
and throw the exception back to the caller (TFileAggregatedLogsBlock.render) so 
that it is able to catch it and log it 
(https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L150).

This ensures that we don't leak connections etc. wherever the reader fails to 
instantiate (=new AggregatedLogFormat.LogReader).

Based on your feedback, I performed functional testing with IndexedFormat 
(IFile) by setting the following properties inside yarn-site.xml:
{code}
    
        yarn.log-aggregation.file-formats
        IndexedFormat
    
    
        yarn.log-aggregation.file-controller.IndexedFormat.class
        
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController
    
    
        yarn.log-aggregation.IndexedFormat.remote-app-log-dir
        /tmp/ifilelogs
    
    
        
yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix
        ifilelogs
    
{code}

Like the earlier scenario, I corrupted the Ifile (aggregated log in HDFS) and 
tried to render it in JHS Web UI, however, no leaks were found for this case. 

The call happens in this fashion:

IndexedFileAggregatedLogsBlock.render() -> 
LogAggregationIndexedFileController.loadIndexedLogsMeta(…)

IOException is encountered inside this try block, however, notice the finally 
clause here -> 
https://github.com/apache/hadoop/blob/4af2556b48e01150851c7f273a254a16324ba843/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/LogAggregationIndexedFileController.java#L900.
 This helps cleaning up the socket connection by closing out FSDataInputStream. 
 

You will notice that this is a different call stack to the TFile case as we 
don't have a call to AggregatedLogFormat.LogReader i.e. it is coded differently.
Regardless, thanks to that finally clause, it does end up cleaning the 
connection and there are no CLOSE_WAIT leaks in case of a corrupted log file 
being encountered. (Bad thing here is that only a WARN log is presented to the 
user in the JHS logs in case of rendering failing for Tfile logs and there is 
no stacktrace logged coming from the exception here - 
https://github.com/apache/hadoop/blob/c24af4b0d6fc32938b076161b5a8c86d38e3e0a1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/IndexedFileAggregatedLogsBlock.java#L136
 as the exception is just swallowed up inside the catch{} clause. This may 
warrant a separate JIRA.)

As part of this fix, I looked for any occurrences of "new TFile.Reader" that 
may cause connection leaks somewhere else. I found two :
# TFileDumper, see 
https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/file/tfile/TFileDumper.java#L103,
 and,
# FileSystemApplicationHistoryStore, see 
https://github.com/apache/hadoop/blob/7dac7e1d13eaf0eac04fe805c7502dcecd597979/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/FileSystemApplicationHistoryStore.java#L691

1 is not an issue because FSDataInputStream is getting closed inside finally{} 
clause here: 

[jira] [Commented] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-03-31 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071542#comment-17071542
 ] 

Siddharth Ahuja commented on YARN-10207:


Hi [~adam.antal], thanks for your comments.

The leak happens when AggregatedLogFormat.LogReader fails during instantiation 
inside AggregatedLogFormat.java, specifically, when TFile.Reader creation 
within the AggregatedLogFormat.LogReader's constructor fails due to a corrupted 
file passed in (see above stacktrace).

The fact that FSDataInputStream is not closed out causes the leak.

The caller - TFileAggregatedLogsBlock.render(…) does try to cleanup the reader 
in the finally clause (see 
https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L153),
 however, it assumes that the reader would have been created successfully. 
However, in our case, the reader never manages to get created because it fails 
during construction phase itself due to a corrupted log.

The fix, therefore, is to catch any IO Exceptions within 
AggregatedLogFormat.LogReader itself inside the constructor, perform a close of 
all the relevant entities including FSDataInputStream if we do indeed hit any 
and throw the exception back to the caller (TFileAggregatedLogsBlock.render) so 
that it is able to catch it and log it 
(https://github.com/apache/hadoop/blob/460ba7fb14114f44e14a660f533f32c54e504478/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/tfile/TFileAggregatedLogsBlock.java#L150).

This ensures that we don't leak connections etc. wherever the reader fails to 
instantiate (=new AggregatedLogFormat.LogReader).

Based on your feedback, I performed functional testing with IndexedFormat 
(IFile) by setting the following properties inside yarn-site.xml:
{code}
    
        yarn.log-aggregation.file-formats
        IndexedFormat
    
    
        yarn.log-aggregation.file-controller.IndexedFormat.class
        
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController
    
    
        yarn.log-aggregation.IndexedFormat.remote-app-log-dir
        /tmp/ifilelogs
    
    
        
yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix
        ifilelogs
    
{code}

Like the earlier scenario, I corrupted the Ifile (aggregated log in HDFS) and 
tried to render it in JHS Web UI, however, no leaks were found for this case. 

The call happens in this fashion:

IndexedFileAggregatedLogsBlock.render() -> 
LogAggregationIndexedFileController.loadIndexedLogsMeta(…)

IOException is encountered inside this try block, however, notice the finally 
clause here -> 
https://github.com/apache/hadoop/blob/4af2556b48e01150851c7f273a254a16324ba843/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/LogAggregationIndexedFileController.java#L900.
 This helps cleaning up the socket connection by closing out FSDataInputStream. 
 

You will notice that this is a different call stack to the TFile case as we 
don't have a call to AggregatedLogFormat.LogReader i.e. it is coded differently.
Regardless, thanks to that finally clause, it does end up cleaning the 
connection and there are no CLOSE_WAIT leaks in case of a corrupted log file 
being encountered. (Bad thing here is that only a WARN log is presented to the 
user in the JHS logs in case of rendering failing for Tfile logs and there is 
no stacktrace logged coming from the exception here - 
https://github.com/apache/hadoop/blob/c24af4b0d6fc32938b076161b5a8c86d38e3e0a1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/IndexedFileAggregatedLogsBlock.java#L136
 as the exception is just swallowed up inside the catch{} clause. This may 
warrant a separate JIRA.)

As part of this fix, I looked for any occurrences of "new TFile.Reader" that 
may cause connection leaks somewhere else. I found two :
# TFileDumper, see 
https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/file/tfile/TFileDumper.java#L103,
 and,
# FileSystemApplicationHistoryStore, see 
https://github.com/apache/hadoop/blob/7dac7e1d13eaf0eac04fe805c7502dcecd597979/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/FileSystemApplicationHistoryStore.java#L691

1 is not an issue because FSDataInputStream is getting closed inside finally{} 
clause here: 

[jira] [Updated] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-03-31 Thread Siddharth Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Ahuja updated YARN-10207:
---
Attachment: YARN-10207.001.patch

> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> -
>
> Key: YARN-10207
> URL: https://issues.apache.org/jira/browse/YARN-10207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-10207.001.patch
>
>
> File descriptor leaks are observed coming from the JobHistoryServer process 
> while it tries to render a "corrupted" aggregated log on the JHS Web UI.
> Issue reproduced using the following steps:
> # Ran a sample Hadoop MR Pi job, it had the id - 
> application_1582676649923_0026.
> # Copied an aggregated log file from HDFS to local FS:
> {code}
> hdfs dfs -get 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Updated the TFile metadata at the bottom of this file with some junk to 
> corrupt the file :
> *Before:*
> {code}
>   
> ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP
> {code}
> *After:*
> {code}
>   
> ^@^GVERSION*(^@_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah
> {code}
> Notice "blah" (junk) added at the very end.
> # Remove the existing aggregated log file that will need to be replaced by 
> our modified copy from step 3 (as otherwise HDFS will prevent it from placing 
> the file with the same name as it already exists):
> {code}
> hdfs dfs -rm -r -f 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Upload the corrupted aggregated file back to HDFS:
> {code}
> hdfs dfs -put _8041 
> /tmp/logs/systest/logs/application_1582676649923_0026
> {code}
> # Visit HistoryServer Web UI
> # Click on job_1582676649923_0026
> # Click on "logs" link against the AM (assuming the AM ran on nm_hostname)
> # Review the JHS logs, following exception will be seen:
> {code}
>   2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error 
> getting logs for job_1582676649923_0026
>   java.io.IOException: Not a valid BCFile.
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628)
>   at 
> org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341)
>   at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
>