[jira] [Commented] (YARN-9581) LogsCli getAMContainerInfoForRMWebService ignores rm2

2019-05-24 Thread Tan, Wangda (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847962#comment-16847962
 ] 

Tan, Wangda commented on YARN-9581:
---

Nice catch, thanks [~Prabhu Joseph].

> LogsCli getAMContainerInfoForRMWebService ignores rm2
> -
>
> Key: YARN-9581
> URL: https://issues.apache.org/jira/browse/YARN-9581
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> Yarn Logs fails for a running job in case of RM HA with rm2 active. 
> {code}
> hrt_qa@prabhuYarn:~> /usr/hdp/current/hadoop-yarn-client/bin/yarn  logs 
> -applicationId application_1558613472348_0004 -am 1
> 19/05/24 18:04:49 INFO client.AHSProxy: Connecting to Application History 
> server at prabhuYarn/172.27.23.55:10200
> 19/05/24 18:04:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Unable to get AM container informations for the 
> application:application_1558613472348_0004
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> https://prabhuYarn:8090/ws/v1/cluster/apps/application_1558613472348_0004/appattempts
> Can not get AMContainers logs for the 
> application:application_1558613472348_0004 with the appOwner:hrt_qa
> {code}
> LogsCli getRMWebAppURLWithoutScheme only checks the first one from the RM 
> list yarn.resourcemanager.ha.rm-ids.
> {code}
> yarnConfig.set(YarnConfiguration.RM_HA_ID, rmIds.get(0));
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9563) Resource report REST API could return NaN or Inf

2019-05-24 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847914#comment-16847914
 ] 

Jonathan Eagles commented on YARN-9563:
---

Both TestLeaderElectorService and TestCapacityOverTimePolicy are flaky tests, 
but can you address the small checkstyle issues mentioned in the report, 
[~ahussein]?

> Resource report REST API could return NaN or Inf
> 
>
> Key: YARN-9563
> URL: https://issues.apache.org/jira/browse/YARN-9563
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9563.001.patch, YARN-9563.002.patch, 
> YARN-9563.003.patch
>
>
> The Resource Manager's Cluster Applications and Cluster Application REST APIs 
> are sometimes returning invalid JSON. This was addressed in YARN-6082.
> However, the fix only fixes the calculation in one site and does not 
> guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web 
> GUI if the columns cannot render non-numeric values.
> The suggested fix is to check for NaN/Inf in the protob. The protob replaces 
> NaN/Inf by 0.0f.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9563) Resource report REST API could return NaN or Inf

2019-05-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847885#comment-16847885
 ] 

Hadoop QA commented on YARN-9563:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 28s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 2 new + 439 unchanged - 0 fixed = 441 total (was 439) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 52s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}130m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestLeaderElectorService |
|   | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9563 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12969676/YARN-9563.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9cfda2aea48f 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6d0e79c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24148/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Issue Comment Deleted] (YARN-9563) Resource report REST API could return NaN or Inf

2019-05-24 Thread Ahmed Hussein (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated YARN-9563:

Comment: was deleted

(was: I checked that 2.8 has the same implementation and I could not spot 
differences in between the two versions.)

> Resource report REST API could return NaN or Inf
> 
>
> Key: YARN-9563
> URL: https://issues.apache.org/jira/browse/YARN-9563
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9563.001.patch, YARN-9563.002.patch, 
> YARN-9563.003.patch
>
>
> The Resource Manager's Cluster Applications and Cluster Application REST APIs 
> are sometimes returning invalid JSON. This was addressed in YARN-6082.
> However, the fix only fixes the calculation in one site and does not 
> guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web 
> GUI if the columns cannot render non-numeric values.
> The suggested fix is to check for NaN/Inf in the protob. The protob replaces 
> NaN/Inf by 0.0f.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9563) Resource report REST API could return NaN or Inf

2019-05-24 Thread Ahmed Hussein (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847882#comment-16847882
 ] 

Ahmed Hussein commented on YARN-9563:
-

I will create another patch for 2.8

> Resource report REST API could return NaN or Inf
> 
>
> Key: YARN-9563
> URL: https://issues.apache.org/jira/browse/YARN-9563
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9563.001.patch, YARN-9563.002.patch, 
> YARN-9563.003.patch
>
>
> The Resource Manager's Cluster Applications and Cluster Application REST APIs 
> are sometimes returning invalid JSON. This was addressed in YARN-6082.
> However, the fix only fixes the calculation in one site and does not 
> guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web 
> GUI if the columns cannot render non-numeric values.
> The suggested fix is to check for NaN/Inf in the protob. The protob replaces 
> NaN/Inf by 0.0f.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9563) Resource report REST API could return NaN or Inf

2019-05-24 Thread Ahmed Hussein (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847875#comment-16847875
 ] 

Ahmed Hussein commented on YARN-9563:
-

I checked that 2.8 has the same implementation and I could not spot differences 
in between the two versions.

> Resource report REST API could return NaN or Inf
> 
>
> Key: YARN-9563
> URL: https://issues.apache.org/jira/browse/YARN-9563
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9563.001.patch, YARN-9563.002.patch, 
> YARN-9563.003.patch
>
>
> The Resource Manager's Cluster Applications and Cluster Application REST APIs 
> are sometimes returning invalid JSON. This was addressed in YARN-6082.
> However, the fix only fixes the calculation in one site and does not 
> guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web 
> GUI if the columns cannot render non-numeric values.
> The suggested fix is to check for NaN/Inf in the protob. The protob replaces 
> NaN/Inf by 0.0f.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9563) Resource report REST API could return NaN or Inf

2019-05-24 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847840#comment-16847840
 ] 

Jonathan Eagles commented on YARN-9563:
---

I'm +1 on patch 003. I'll wait for Hadoop QA results as well as give some time 
for other reviewers before committing this. I'm guessing this patch is targeted 
for all lines back to 2.8?

> Resource report REST API could return NaN or Inf
> 
>
> Key: YARN-9563
> URL: https://issues.apache.org/jira/browse/YARN-9563
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9563.001.patch, YARN-9563.002.patch, 
> YARN-9563.003.patch
>
>
> The Resource Manager's Cluster Applications and Cluster Application REST APIs 
> are sometimes returning invalid JSON. This was addressed in YARN-6082.
> However, the fix only fixes the calculation in one site and does not 
> guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web 
> GUI if the columns cannot render non-numeric values.
> The suggested fix is to check for NaN/Inf in the protob. The protob replaces 
> NaN/Inf by 0.0f.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-24 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847833#comment-16847833
 ] 

Eric Badger commented on YARN-9560:
---

I've opened YARN-9582 to port the yarn sysfs feature to the new runtime

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9582) Port YARN-8569 to FSImageContainerRuntime

2019-05-24 Thread Eric Badger (JIRA)
Eric Badger created YARN-9582:
-

 Summary: Port YARN-8569 to FSImageContainerRuntime
 Key: YARN-9582
 URL: https://issues.apache.org/jira/browse/YARN-9582
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Eric Badger


After YARN-9562 is merged, we should add in the yarn sysfs to the new runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9581) LogsCli getAMContainerInfoForRMWebService ignores rm2

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9581:

Affects Version/s: 3.2.0

> LogsCli getAMContainerInfoForRMWebService ignores rm2
> -
>
> Key: YARN-9581
> URL: https://issues.apache.org/jira/browse/YARN-9581
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> Yarn Logs fails for a running job in case of RM HA with rm2 active. 
> {code}
> hrt_qa@prabhuYarn:~> /usr/hdp/current/hadoop-yarn-client/bin/yarn  logs 
> -applicationId application_1558613472348_0004 -am 1
> 19/05/24 18:04:49 INFO client.AHSProxy: Connecting to Application History 
> server at prabhuYarn/172.27.23.55:10200
> 19/05/24 18:04:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Unable to get AM container informations for the 
> application:application_1558613472348_0004
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> https://prabhuYarn:8090/ws/v1/cluster/apps/application_1558613472348_0004/appattempts
> Can not get AMContainers logs for the 
> application:application_1558613472348_0004 with the appOwner:hrt_qa
> {code}
> LogsCli getRMWebAppURLWithoutScheme only checks the first one from the RM 
> list yarn.resourcemanager.ha.rm-ids.
> {code}
> yarnConfig.set(YarnConfiguration.RM_HA_ID, rmIds.get(0));
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9581) LogsCli getAMContainerInfoForRMWebService ignores rm2

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9581:

Component/s: client

> LogsCli getAMContainerInfoForRMWebService ignores rm2
> -
>
> Key: YARN-9581
> URL: https://issues.apache.org/jira/browse/YARN-9581
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> Yarn Logs fails for a running job in case of RM HA with rm2 active. 
> {code}
> hrt_qa@prabhuYarn:~> /usr/hdp/current/hadoop-yarn-client/bin/yarn  logs 
> -applicationId application_1558613472348_0004 -am 1
> 19/05/24 18:04:49 INFO client.AHSProxy: Connecting to Application History 
> server at prabhuYarn/172.27.23.55:10200
> 19/05/24 18:04:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Unable to get AM container informations for the 
> application:application_1558613472348_0004
> java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> https://prabhuYarn:8090/ws/v1/cluster/apps/application_1558613472348_0004/appattempts
> Can not get AMContainers logs for the 
> application:application_1558613472348_0004 with the appOwner:hrt_qa
> {code}
> LogsCli getRMWebAppURLWithoutScheme only checks the first one from the RM 
> list yarn.resourcemanager.ha.rm-ids.
> {code}
> yarnConfig.set(YarnConfiguration.RM_HA_ID, rmIds.get(0));
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9581) LogsCli getAMContainerInfoForRMWebService ignores rm2

2019-05-24 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created YARN-9581:
---

 Summary: LogsCli getAMContainerInfoForRMWebService ignores rm2
 Key: YARN-9581
 URL: https://issues.apache.org/jira/browse/YARN-9581
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


Yarn Logs fails for a running job in case of RM HA with rm2 active. 

{code}
hrt_qa@prabhuYarn:~> /usr/hdp/current/hadoop-yarn-client/bin/yarn  logs 
-applicationId application_1558613472348_0004 -am 1
19/05/24 18:04:49 INFO client.AHSProxy: Connecting to Application History 
server at prabhuYarn/172.27.23.55:10200
19/05/24 18:04:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
Unable to get AM container informations for the 
application:application_1558613472348_0004
java.io.IOException: 
org.apache.hadoop.security.authentication.client.AuthenticationException: Error 
while authenticating with endpoint: 
https://prabhuYarn:8090/ws/v1/cluster/apps/application_1558613472348_0004/appattempts

Can not get AMContainers logs for the 
application:application_1558613472348_0004 with the appOwner:hrt_qa
{code}

LogsCli getRMWebAppURLWithoutScheme only checks the first one from the RM list 
yarn.resourcemanager.ha.rm-ids.

{code}
yarnConfig.set(YarnConfiguration.RM_HA_ID, rmIds.get(0));
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9563) Resource report REST API could return NaN or Inf

2019-05-24 Thread Ahmed Hussein (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847805#comment-16847805
 ] 

Ahmed Hussein commented on YARN-9563:
-

[~jeagles] I uploaded another modifying TestLeafQueue to check against 
NaN/Infinity. This test case will fail if someone modifies the 
FiCaSchedulerApp's resource calculation without checking for 0 denominator.

> Resource report REST API could return NaN or Inf
> 
>
> Key: YARN-9563
> URL: https://issues.apache.org/jira/browse/YARN-9563
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9563.001.patch, YARN-9563.002.patch, 
> YARN-9563.003.patch
>
>
> The Resource Manager's Cluster Applications and Cluster Application REST APIs 
> are sometimes returning invalid JSON. This was addressed in YARN-6082.
> However, the fix only fixes the calculation in one site and does not 
> guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web 
> GUI if the columns cannot render non-numeric values.
> The suggested fix is to check for NaN/Inf in the protob. The protob replaces 
> NaN/Inf by 0.0f.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847804#comment-16847804
 ] 

Eric Yang commented on YARN-9560:
-

[~ebadger] Sounds reasonable to me.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9563) Resource report REST API could return NaN or Inf

2019-05-24 Thread Ahmed Hussein (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated YARN-9563:

Attachment: YARN-9563.003.patch

> Resource report REST API could return NaN or Inf
> 
>
> Key: YARN-9563
> URL: https://issues.apache.org/jira/browse/YARN-9563
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9563.001.patch, YARN-9563.002.patch, 
> YARN-9563.003.patch
>
>
> The Resource Manager's Cluster Applications and Cluster Application REST APIs 
> are sometimes returning invalid JSON. This was addressed in YARN-6082.
> However, the fix only fixes the calculation in one site and does not 
> guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web 
> GUI if the columns cannot render non-numeric values.
> The suggested fix is to check for NaN/Inf in the protob. The protob replaces 
> NaN/Inf by 0.0f.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-24 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847782#comment-16847782
 ] 

Eric Badger commented on YARN-9560:
---

bq. Are there two JSON formats used in OCIContainerRuntime? One for passing 
information between Java and C, and another passed to runc for execution?

In the patch for YARN-9562 there will be 2 formats, 1 for Java to C and one for 
C to runc. 

bq. If there is already two types of JSON messages setup for communication 
between Java-container-executor and container-executor-runc, then it would be 
better to have sysfs included for communication between Java and 
container-executor. Container-executor binary needs to handle how to translate 
the flag into meaningful mount operations for runc.

Agreed that this is necessary for the yarn sysfs feature to work. However, we 
can make that change in a followup JIRA. I don't want to conflate this 
restructuring JIRA with features that will need extra code changes to support 
such as changing the {{setYarnSysFS()}} method. The new runtime won't be using 
{{DockerRunCommand}}, since that is Docker specific. So to make way for the 
yarn sysfs feature in {{OCIContainerRuntime}}, I'd need to change 
{{setYarnSysFS} to something more general. This is something I'd like to avoid 
so that I can keep the changes as minimal as possible here and then make any 
non-trivial changes in followup JIRAs. That way we can minimize the patch size 
and the number of things that we're changing. 

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9452) Fix failing testcases TestDistributedShell and TestTimelineAuthFilterForV2

2019-05-24 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847778#comment-16847778
 ] 

Prabhu Joseph commented on YARN-9452:
-

[~adam.antal] [~snemeth] Can you review this Jira when you get time. This fixes 
failing testcases from TestDistributedShell and TestTimelineAuthFilterForV2.  
Failed testcase TestContainerSchedulerQueuing will be handled by YARN-9427. 

TestDistributedShell testcases works fine with the patch.

> Fix failing testcases TestDistributedShell and TestTimelineAuthFilterForV2
> --
>
> Key: YARN-9452
> URL: https://issues.apache.org/jira/browse/YARN-9452
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2, distributed-shell, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9452-001.patch, YARN-9452-002.patch, 
> YARN-9452-003.patch
>
>
> *TestDistributedShell#testDSShellWithoutDomainV2CustomizedFlow*
> {code}
> [ERROR] 
> testDSShellWithoutDomainV2CustomizedFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 72.14 s  <<< FAILURE!
> java.lang.AssertionError: Entity ID prefix should be same across each publish 
> of same entity expected:<9223372036854775806> but was:<9223370482298585580>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.verifyEntityForTimelineV2(TestDistributedShell.java:695)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:588)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:459)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow(TestDistributedShell.java:330)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> *TestTimelineAuthFilterForV2#testPutTimelineEntities*
> {code}
> [ERROR] 
> testPutTimelineEntities[3](org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2)
>   Time elapsed: 1.047 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertNotNull(Assert.java:712)
>   at org.junit.Assert.assertNotNull(Assert.java:722)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.verifyEntity(TestTimelineAuthFilterForV2.java:282)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.testPutTimelineEntities(TestTimelineAuthFilterForV2.java:421)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> 

[jira] [Resolved] (YARN-9558) Log Aggregation testcases failing

2019-05-24 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang resolved YARN-9558.
-
Resolution: Fixed

Thank [~Prabhu Joseph].  Keep this patch in Hadoop 3.3.0+.  Mark as resolved 
again.

> Log Aggregation testcases failing
> -
>
> Key: YARN-9558
> URL: https://issues.apache.org/jira/browse/YARN-9558
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Affects Versions: 3.3.0, 3.2.1, 3.1.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9558-001.patch, YARN-9558-002.patch, 
> YARN-9558-003.patch
>
>
> Test cases related to Log Aggregation from below classes are failing
> hadoop.yarn.server.nodemanager.webapp.TestNMWebServices 
> hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
>  
> hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices 
> hadoop.yarn.client.cli.TestLogsCLI 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9452) Fix failing testcases TestDistributedShell and TestTimelineAuthFilterForV2

2019-05-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847770#comment-16847770
 ] 

Hadoop QA commented on YARN-9452:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 28s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
22s{color} | {color:green} hadoop-yarn-server-tests in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
34s{color} | {color:green} hadoop-yarn-applications-distributedshell in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce 

[jira] [Commented] (YARN-9558) Log Aggregation testcases failing

2019-05-24 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847741#comment-16847741
 ] 

Prabhu Joseph commented on YARN-9558:
-

Yes Sure [~eyang]. Thanks.

> Log Aggregation testcases failing
> -
>
> Key: YARN-9558
> URL: https://issues.apache.org/jira/browse/YARN-9558
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Affects Versions: 3.3.0, 3.2.1, 3.1.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9558-001.patch, YARN-9558-002.patch, 
> YARN-9558-003.patch
>
>
> Test cases related to Log Aggregation from below classes are failing
> hadoop.yarn.server.nodemanager.webapp.TestNMWebServices 
> hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
>  
> hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices 
> hadoop.yarn.client.cli.TestLogsCLI 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847723#comment-16847723
 ] 

Hadoop QA commented on YARN-9560:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 22s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 10 unchanged - 2 fixed = 11 total (was 12) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
22s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9560 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12969658/YARN-9560.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9abb7eaccc63 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 460ba7f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24147/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24147/testReport/ |
| Max. process+thread count | 447 (vs. 

[jira] [Comment Edited] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847712#comment-16847712
 ] 

Eric Yang edited comment on YARN-9560 at 5/24/19 4:44 PM:
--

[~ebadger] Are there two JSON formats used in OCIContainerRuntime?  One for 
passing information between Java and C, and another passed to runc for 
execution?  If there is only one format that the one passing from Java is 
consumed by runc, then I agree with you that it is not easy to pass this flag 
and follow up JIRA make sense to further develop communication filtering 
between Java, container-executor and runc.  If there is already two types of 
JSON messages setup for communication between Java-container-executor and 
container-executor-runc, then it would be better to have sysfs included for 
communication between Java and container-executor.  Container-executor binary 
needs to handle how to translate the flag into meaningful mount operations for 
runc.


was (Author: eyang):
[~ebadger] Are there two JSON formats used in OCIContainerRuntime?  One for 
passing information between Java and C, and another passed to runc for 
execution?  If there is only one format that the one passing from Java is 
consumed by runc, then I agree with you that it is not easy to pass this flag 
and follow up JIRA make sense to further develop communication filtering 
between Java, container-executor and runc.  If there is already two types of 
JSON messages setup for communication between Java <-> container-executor and 
container-executor <-> runc, then it would be better to have sysfs included for 
communication between Java and container-executor.  Container-executor binary 
needs to handle how to translate the flag into meaningful mount operations for 
runc.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847712#comment-16847712
 ] 

Eric Yang commented on YARN-9560:
-

[~ebadger] Are there two JSON formats used in OCIContainerRuntime?  One for 
passing information between Java and C, and another passed to runc for 
execution?  If there is only one format that the one passing from Java is 
consumed by runc, then I agree with you that it is not easy to pass this flag 
and follow up JIRA make sense to further develop communication filtering 
between Java, container-executor and runc.  If there is already two types of 
JSON messages setup for communication between Java <-> container-executor and 
container-executor <-> runc, then it would be better to have sysfs included for 
communication between Java and container-executor.  Container-executor binary 
needs to handle how to translate the flag into meaningful mount operations for 
runc.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9525) IFile format is not working against s3a remote folder

2019-05-24 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847698#comment-16847698
 ] 

Adam Antal commented on YARN-9525:
--

It looks like we still have some problems with IFile, I got the following 
errors:

Cannot seek to a negative offset -4

>From LogAggregationIndexedFileController.java it looks like we are not writing 
>out full stacktraces, but probably it's originating from 
>{{loadIndexedLogsMeta}} where we do a seek with negative offset. It must be 
>some similar byte-magic as the first issue was, will look into it deeper 
>tomorrow.

> IFile format is not working against s3a remote folder
> -
>
> Key: YARN-9525
> URL: https://issues.apache.org/jira/browse/YARN-9525
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: IFile-S3A-POC01.patch, YARN-9525-001.patch
>
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} 
> configured to an s3a URI throws the following exception during log 
> aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload 
> this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or 
> directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
>   ... 7 more
> {noformat}
> This stack trace point to 
> {{LogAggregationIndexedFileController$initializeWriter}} where we do the 
> following steps (in a non-rolling log aggregation setup):
> - create FSDataOutputStream
> - writing out a UUID
> - flushing
> - immediately after that we call a GetFileStatus to get the length of the log 
> file (the bytes we just wrote out), and that's where the failures happens: 
> the file is not there yet due to eventual consistency.
> Maybe we can get rid of that, so we can use IFile format against a s3a target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9512) [JDK11] TestAuxServices#testCustomizedAuxServiceClassPath ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLC

2019-05-24 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847689#comment-16847689
 ] 

Adam Antal commented on YARN-9512:
--

I made some investigation, but don't have a proposed solutions to this. 

Some related articles and stuff: 
- It is a known [migration 
problem|https://blog.codefx.org/java/java-9-migration-guide/#Casting-To-URL-Class-Loader]
 from Java 8 to 9. This article says some case where the migration is easy, but 
non of them is applicable here.
- The [Oracle community|https://community.oracle.com/thread/4011800] has also 
got a thread about this problem, might worth chasing that option they suggested.

Most likely it is just a testing issue, but I am still unsure about this.

I'm also CC [~billie.rinaldi], as had some work around this area. If you have 
any thoughts on this, we'd be happy.

Also other components bumped into this - similar: TEZ-3860.

> [JDK11] TestAuxServices#testCustomizedAuxServiceClassPath ClassCastException: 
> class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
> java.net.URLClassLoader
> ---
>
> Key: YARN-9512
> URL: https://issues.apache.org/jira/browse/YARN-9512
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Siyao Meng
>Assignee: Adam Antal
>Priority: Major
>
> Found in maven JDK 11 unit test run. Compiled on JDK 8:
> {code}
> [ERROR] 
> testCustomizedAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices)
>   Time elapsed: 0.019 s  <<< ERROR!java.lang.ClassCastException: class 
> jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
> java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
> java.net.URLClassLoader are in module java.base of loader 'bootstrap')
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices$ServiceC.getMetaData(TestAuxServices.java:197)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStart(AuxServices.java:315)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testCustomizedAuxServiceClassPath(TestAuxServices.java:344)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9558) Log Aggregation testcases failing

2019-05-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847685#comment-16847685
 ] 

Eric Yang commented on YARN-9558:
-

[~Prabhu Joseph] I think it probably better to keep these changes in 3.3.0 only 
because these changes introduces a new configuration flag 
NM_REMOTE_APP_LOG_DIR_INCLUDE_OLDER and new behavior to locate log files.  It 
is better that we don't introduce new behavior or flags during patch version 
back port because the upstream configuration management utility does not know 
about the new flag and log structure.  This would reduce probability of 
introducing incompatible changes that we may not see otherwise.  If you agree, 
I will reset the target version to 3.3.0 only.

> Log Aggregation testcases failing
> -
>
> Key: YARN-9558
> URL: https://issues.apache.org/jira/browse/YARN-9558
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Affects Versions: 3.3.0, 3.2.1, 3.1.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9558-001.patch, YARN-9558-002.patch, 
> YARN-9558-003.patch
>
>
> Test cases related to Log Aggregation from below classes are failing
> hadoop.yarn.server.nodemanager.webapp.TestNMWebServices 
> hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
>  
> hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices 
> hadoop.yarn.client.cli.TestLogsCLI 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS

2019-05-24 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847663#comment-16847663
 ] 

Prabhu Joseph commented on YARN-8625:
-

[~eepayne] Branch-2.7 checkstyle issues are ignorable. Branch-2.8 asflicense 
license issue looks not related. YARN-9558 is fixed in trunk, have validated 
TestAHSWebServices in local.

> Aggregate Resource Allocation for each job is not present in ATS
> 
>
> Key: YARN-8625
> URL: https://issues.apache.org/jira/browse/YARN-8625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 2.7.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-8625.patch, 0002-YARN-8625.patch, 
> ApplicationHistoryServer_Rest_Api.png, ApplicationHistoryServer_UI.png, 
> YARN-8625-branch-2.7.001.patch, YARN-8625-branch-2.8.001.patch, yarn-site.xml
>
>
> Aggregate Resource Allocation shown on RM UI for finished job is very useful 
> metric to understand how much resource a job has consumed. But this does not 
> get stored in ATS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-24 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847658#comment-16847658
 ] 

Eric Badger commented on YARN-9560:
---

Patch 006 adds the Private and Unstable flags to {{OCIContainerRuntime}} 

bq. It is basically a forward and pass operation to make sure that downstream C 
side of code receives this flag, and perform the necessary operation to setup 
the sysfs directory in the container working directory. Sysfs directory will be 
populated through async rest api call with a json file that contains the 
application structure, i.e. ip address and host names of the containers. In 
this case, by passing the flag as part of json to container-executor is 
sufficient.

I still think we should add this in a follow-up JIRA. The current code is 
docker specific since it is a method of {{DockerRunCommand}}. If you agree I 
can file a followup JIRA

bq. I will do tests, and probably try with and without ENTRYPOINT to make sure 
it's well covered.
Thanks! I appreciate it

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-24 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9560:
--
Attachment: YARN-9560.006.patch

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9452) Fix failing testcases TestDistributedShell and TestTimelineAuthFilterForV2

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9452:

Summary: Fix failing testcases TestDistributedShell and 
TestTimelineAuthFilterForV2  (was: Timeline related testcases are failing)

> Fix failing testcases TestDistributedShell and TestTimelineAuthFilterForV2
> --
>
> Key: YARN-9452
> URL: https://issues.apache.org/jira/browse/YARN-9452
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9452-001.patch, YARN-9452-002.patch, 
> YARN-9452-003.patch
>
>
> *TestDistributedShell#testDSShellWithoutDomainV2CustomizedFlow*
> {code}
> [ERROR] 
> testDSShellWithoutDomainV2CustomizedFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 72.14 s  <<< FAILURE!
> java.lang.AssertionError: Entity ID prefix should be same across each publish 
> of same entity expected:<9223372036854775806> but was:<9223370482298585580>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.verifyEntityForTimelineV2(TestDistributedShell.java:695)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:588)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:459)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow(TestDistributedShell.java:330)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> *TestTimelineAuthFilterForV2#testPutTimelineEntities*
> {code}
> [ERROR] 
> testPutTimelineEntities[3](org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2)
>   Time elapsed: 1.047 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertNotNull(Assert.java:712)
>   at org.junit.Assert.assertNotNull(Assert.java:722)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.verifyEntity(TestTimelineAuthFilterForV2.java:282)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.testPutTimelineEntities(TestTimelineAuthFilterForV2.java:421)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 

[jira] [Updated] (YARN-9452) Timeline related testcases are failing

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9452:

Description: 
*TestDistributedShell#testDSShellWithoutDomainV2CustomizedFlow*
{code}
[ERROR] 
testDSShellWithoutDomainV2CustomizedFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 72.14 s  <<< FAILURE!
java.lang.AssertionError: Entity ID prefix should be same across each publish 
of same entity expected:<9223372036854775806> but was:<9223370482298585580>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.verifyEntityForTimelineV2(TestDistributedShell.java:695)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:588)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:459)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow(TestDistributedShell.java:330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}


*TestTimelineAuthFilterForV2#testPutTimelineEntities*
{code}
[ERROR] 
testPutTimelineEntities[3](org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2)
  Time elapsed: 1.047 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:712)
at org.junit.Assert.assertNotNull(Assert.java:722)
at 
org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.verifyEntity(TestTimelineAuthFilterForV2.java:282)
at 
org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.testPutTimelineEntities(TestTimelineAuthFilterForV2.java:421)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at 

[jira] [Updated] (YARN-9452) Fix failing testcases TestDistributedShell and TestTimelineAuthFilterForV2

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9452:

Component/s: distributed-shell

> Fix failing testcases TestDistributedShell and TestTimelineAuthFilterForV2
> --
>
> Key: YARN-9452
> URL: https://issues.apache.org/jira/browse/YARN-9452
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2, distributed-shell, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9452-001.patch, YARN-9452-002.patch, 
> YARN-9452-003.patch
>
>
> *TestDistributedShell#testDSShellWithoutDomainV2CustomizedFlow*
> {code}
> [ERROR] 
> testDSShellWithoutDomainV2CustomizedFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 72.14 s  <<< FAILURE!
> java.lang.AssertionError: Entity ID prefix should be same across each publish 
> of same entity expected:<9223372036854775806> but was:<9223370482298585580>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.verifyEntityForTimelineV2(TestDistributedShell.java:695)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:588)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:459)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow(TestDistributedShell.java:330)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> *TestTimelineAuthFilterForV2#testPutTimelineEntities*
> {code}
> [ERROR] 
> testPutTimelineEntities[3](org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2)
>   Time elapsed: 1.047 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertNotNull(Assert.java:712)
>   at org.junit.Assert.assertNotNull(Assert.java:722)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.verifyEntity(TestTimelineAuthFilterForV2.java:282)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.testPutTimelineEntities(TestTimelineAuthFilterForV2.java:421)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> 

[jira] [Updated] (YARN-9452) Timeline related testcases are failing

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9452:

Description: 
*TestDistributedShell#testDSShellWithoutDomainV2CustomizedFlow *
{code}
[ERROR] 
testDSShellWithoutDomainV2CustomizedFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 72.14 s  <<< FAILURE!
java.lang.AssertionError: Entity ID prefix should be same across each publish 
of same entity expected:<9223372036854775806> but was:<9223370482298585580>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.verifyEntityForTimelineV2(TestDistributedShell.java:695)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:588)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:459)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow(TestDistributedShell.java:330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}


*TestTimelineAuthFilterForV2#testPutTimelineEntities *
{code}
[ERROR] 
testPutTimelineEntities[3](org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2)
  Time elapsed: 1.047 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:712)
at org.junit.Assert.assertNotNull(Assert.java:722)
at 
org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.verifyEntity(TestTimelineAuthFilterForV2.java:282)
at 
org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.testPutTimelineEntities(TestTimelineAuthFilterForV2.java:421)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at 

[jira] [Commented] (YARN-9573) DistributedShell cannot specify LogAggregationContext

2019-05-24 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847644#comment-16847644
 ] 

Adam Antal commented on YARN-9573:
--

Thanks for the reviews, just to make sure let's wait until YARN-9425 got 
comitted, don't want to accidentally mess up something in the background.

> DistributedShell cannot specify LogAggregationContext
> -
>
> Key: YARN-9573
> URL: https://issues.apache.org/jira/browse/YARN-9573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: distributed-shell, log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9573.001.patch
>
>
> When DShell sends the application request object to the RM, it doesn't 
> specify the LogAggregationContext object - thus it is not possible to run 
> DShell with various log-aggregation configurations, for e.g. a rolling 
> fashioned log aggregation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9452) Timeline related testcases are failing

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9452:

Attachment: YARN-9452-003.patch

> Timeline related testcases are failing
> --
>
> Key: YARN-9452
> URL: https://issues.apache.org/jira/browse/YARN-9452
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9452-001.patch, YARN-9452-002.patch, 
> YARN-9452-003.patch
>
>
> Timeline related testcases are failing.
> TestDistributedShell#testDSShellWithoutDomainV2CustomizedFlow 
> {code}
> [ERROR] 
> testDSShellWithoutDomainV2CustomizedFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 72.14 s  <<< FAILURE!
> java.lang.AssertionError: Entity ID prefix should be same across each publish 
> of same entity expected:<9223372036854775806> but was:<9223370482298585580>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.verifyEntityForTimelineV2(TestDistributedShell.java:695)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:588)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:459)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow(TestDistributedShell.java:330)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> TestTimelineAuthFilterForV2#testPutTimelineEntities 
> {code}
> [ERROR] 
> testPutTimelineEntities[3](org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2)
>   Time elapsed: 1.047 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertNotNull(Assert.java:712)
>   at org.junit.Assert.assertNotNull(Assert.java:722)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.verifyEntity(TestTimelineAuthFilterForV2.java:282)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.testPutTimelineEntities(TestTimelineAuthFilterForV2.java:421)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> 

[jira] [Resolved] (YARN-9145) [Umbrella] Dynamically add or remove auxiliary services

2019-05-24 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi resolved YARN-9145.
--
   Resolution: Fixed
Fix Version/s: 3.3.0

Yes, thanks for reminding me I forgot to close the umbrella!

> [Umbrella] Dynamically add or remove auxiliary services
> ---
>
> Key: YARN-9145
> URL: https://issues.apache.org/jira/browse/YARN-9145
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Fix For: 3.3.0
>
>
> Umbrella to track tasks supporting adding, removing, or updating auxiliary 
> services without NM restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9145) [Umbrella] Dynamically add or remove auxiliary services

2019-05-24 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847576#comment-16847576
 ] 

Adam Antal commented on YARN-9145:
--

Hi [~billie.rinaldi], is this considered feature-completed?

> [Umbrella] Dynamically add or remove auxiliary services
> ---
>
> Key: YARN-9145
> URL: https://issues.apache.org/jira/browse/YARN-9145
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
>
> Umbrella to track tasks supporting adding, removing, or updating auxiliary 
> services without NM restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436

2019-05-24 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal reassigned YARN-9511:


Assignee: Adam Antal

> [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: 
> The remote jarfile should not be writable by group or others. The current 
> Permission is 436
> ---
>
> Key: YARN-9511
> URL: https://issues.apache.org/jira/browse/YARN-9511
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Siyao Meng
>Assignee: Adam Antal
>Priority: Major
>
> Found in maven JDK 11 unit test run. Compiled on JDK 8.
> {code}
> [ERROR] 
> testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices)
>   Time elapsed: 0.551 s  <<< 
> ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote 
> jarfile should not be writable by group or others. The current Permission is 
> 436
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9558) Log Aggregation testcases failing

2019-05-24 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847529#comment-16847529
 ] 

Prabhu Joseph commented on YARN-9558:
-

Thanks [~eyang]. YARN-9558 also requires YARN-6929 + YARN-9524. Can we include 
all three in branch-3.2 and branch-3.1 as well.

For Branch-3.2, It works by applying YARN-6929-011.patch, YARN-9524-002.patch 
and then YARN-9558-003.patch.

For Branch-3.1, It works by applying YARN-6929-branch-3.1.001.patch and 
YARN-9524-002.patch and then YARN-9558-003.patch.

Have submitted YARN-6929-branch-3.1.001.patch in YARN-6929 Jira.

> Log Aggregation testcases failing
> -
>
> Key: YARN-9558
> URL: https://issues.apache.org/jira/browse/YARN-9558
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, test
>Affects Versions: 3.3.0, 3.2.1, 3.1.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9558-001.patch, YARN-9558-002.patch, 
> YARN-9558-003.patch
>
>
> Test cases related to Log Aggregation from below classes are failing
> hadoop.yarn.server.nodemanager.webapp.TestNMWebServices 
> hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
>  
> hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices 
> hadoop.yarn.client.cli.TestLogsCLI 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9512) [JDK11] TestAuxServices#testCustomizedAuxServiceClassPath ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLCl

2019-05-24 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal reassigned YARN-9512:


Assignee: Adam Antal

> [JDK11] TestAuxServices#testCustomizedAuxServiceClassPath ClassCastException: 
> class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
> java.net.URLClassLoader
> ---
>
> Key: YARN-9512
> URL: https://issues.apache.org/jira/browse/YARN-9512
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Siyao Meng
>Assignee: Adam Antal
>Priority: Major
>
> Found in maven JDK 11 unit test run. Compiled on JDK 8:
> {code}
> [ERROR] 
> testCustomizedAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices)
>   Time elapsed: 0.019 s  <<< ERROR!java.lang.ClassCastException: class 
> jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
> java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
> java.net.URLClassLoader are in module java.base of loader 'bootstrap')
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices$ServiceC.getMetaData(TestAuxServices.java:197)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStart(AuxServices.java:315)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testCustomizedAuxServiceClassPath(TestAuxServices.java:344)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-6929:

Attachment: (was: YARN-6929-branch-3.2.001.patch)

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929-010.patch, YARN-6929-011.patch, 
> YARN-6929-branch-3.1.001.patch, YARN-6929.1.patch, YARN-6929.2.patch, 
> YARN-6929.2.patch, YARN-6929.3.patch, YARN-6929.4.patch, YARN-6929.5.patch, 
> YARN-6929.6.patch, YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
>  //logs//
> {code:java}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> 

[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-6929:

Attachment: YARN-6929-branch-3.2.001.patch

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929-010.patch, YARN-6929-011.patch, 
> YARN-6929-branch-3.1.001.patch, YARN-6929-branch-3.2.001.patch, 
> YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, YARN-6929.3.patch, 
> YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
>  //logs//
> {code:java}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> 

[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-6929:

Attachment: (was: YARN-6929-branch-3.2.001.patch)

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929-010.patch, YARN-6929-011.patch, 
> YARN-6929-branch-3.1.001.patch, YARN-6929.1.patch, YARN-6929.2.patch, 
> YARN-6929.2.patch, YARN-6929.3.patch, YARN-6929.4.patch, YARN-6929.5.patch, 
> YARN-6929.6.patch, YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
>  //logs//
> {code:java}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> 

[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-6929:

Attachment: YARN-6929-branch-3.2.001.patch

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929-010.patch, YARN-6929-011.patch, 
> YARN-6929-branch-3.1.001.patch, YARN-6929-branch-3.2.001.patch, 
> YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, YARN-6929.3.patch, 
> YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
>  //logs//
> {code:java}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> 

[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-05-24 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-6929:

Attachment: YARN-6929-branch-3.1.001.patch

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929-010.patch, YARN-6929-011.patch, 
> YARN-6929-branch-3.1.001.patch, YARN-6929.1.patch, YARN-6929.2.patch, 
> YARN-6929.2.patch, YARN-6929.3.patch, YARN-6929.4.patch, YARN-6929.5.patch, 
> YARN-6929.6.patch, YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
>  //logs//
> {code:java}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> 

[jira] [Updated] (YARN-9573) DistributedShell cannot specify LogAggregationContext

2019-05-24 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9573:
-
Summary: DistributedShell cannot specify LogAggregationContext  (was: 
DistributedShell can't specify LogAggregationContext)

> DistributedShell cannot specify LogAggregationContext
> -
>
> Key: YARN-9573
> URL: https://issues.apache.org/jira/browse/YARN-9573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: distributed-shell, log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9573.001.patch
>
>
> When DShell sends the application request object to the RM, it doesn't 
> specify the LogAggregationContext object - thus it is not possible to run 
> DShell with various log-aggregation configurations, for e.g. a rolling 
> fashioned log aggregation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9573) DistributedShell can't specify LogAggregationContext

2019-05-24 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847472#comment-16847472
 ] 

Szilard Nemeth commented on YARN-9573:
--

Thanks [~Prabhu Joseph]! Then I'm giving +1 (non-binding)

> DistributedShell can't specify LogAggregationContext
> 
>
> Key: YARN-9573
> URL: https://issues.apache.org/jira/browse/YARN-9573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: distributed-shell, log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9573.001.patch
>
>
> When DShell sends the application request object to the RM, it doesn't 
> specify the LogAggregationContext object - thus it is not possible to run 
> DShell with various log-aggregation configurations, for e.g. a rolling 
> fashioned log aggregation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-05-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847468#comment-16847468
 ] 

Hadoop QA commented on YARN-9580:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m  9s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestLeaderElectorService |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9580 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12969609/YARN-9580.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4005624868e2 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 460ba7f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24145/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24145/testReport/ |
| Max. process+thread count | 919 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-9573) DistributedShell can't specify LogAggregationContext

2019-05-24 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847467#comment-16847467
 ] 

Prabhu Joseph commented on YARN-9573:
-

[~snemeth] Test case failures are not related and will be fixed by YARN-9452.

> DistributedShell can't specify LogAggregationContext
> 
>
> Key: YARN-9573
> URL: https://issues.apache.org/jira/browse/YARN-9573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: distributed-shell, log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9573.001.patch
>
>
> When DShell sends the application request object to the RM, it doesn't 
> specify the LogAggregationContext object - thus it is not possible to run 
> DShell with various log-aggregation configurations, for e.g. a rolling 
> fashioned log aggregation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9573) DistributedShell can't specify LogAggregationContext

2019-05-24 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847449#comment-16847449
 ] 

Szilard Nemeth commented on YARN-9573:
--

Hi [~adam.antal]!
Thanks for this patch!

I know checkstyle issue is maybe not strongly related, but if you can fix it 
easily, please do so.
Is the unit test failure related to your patch?
Otherwise, the patch looks good!

> DistributedShell can't specify LogAggregationContext
> 
>
> Key: YARN-9573
> URL: https://issues.apache.org/jira/browse/YARN-9573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: distributed-shell, log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9573.001.patch
>
>
> When DShell sends the application request object to the RM, it doesn't 
> specify the LogAggregationContext object - thus it is not possible to run 
> DShell with various log-aggregation configurations, for e.g. a rolling 
> fashioned log aggregation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9573) DistributedShell can't specify LogAggregationContext

2019-05-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847422#comment-16847422
 ] 

Hadoop QA commented on YARN-9573:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  3s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:
 The patch generated 1 new + 89 unchanged - 1 fixed = 90 total (was 90) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 23s{color} 
| {color:red} hadoop-yarn-applications-distributedshell in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShell |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9573 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12969597/YARN-9573.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3140f21bc6dc 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 460ba7f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Updated] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-05-24 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9580:
---
Attachment: YARN-9580.001.patch

> Fulfilled reservation information in assignment is lost when transferring in 
> ParentQueue#assignContainers
> -
>
> Key: YARN-9580
> URL: https://issues.apache.org/jira/browse/YARN-9580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9580.001.patch
>
>
> When transferring assignment from child queue to parent queue, fulfilled 
> reservation information including fulfilledReservation and 
> fulfilledReservedContainer in assignment is lost.
> When multi-nodes enabled, this lost can raise a problem that allocation 
> proposal is generated but can't be accepted because there is a check for 
> fulfilled reservation information in 
> FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will 
> always be there and the resource of the node can't be used anymore.
> In HB-driven scheduling mode, fulfilled reservation can be allocated via 
> another calling stack: CapacityScheduler#allocateContainersToNode -->  
> CapacityScheduler#allocateContainerOnSingleNode --> 
> CapacityScheduler#allocateFromReservedContainer, in this way assignment can 
> be generated by leaf queue and directly submitted, I think that's why we 
> hardly find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9580) Fulfilled reservation information in assignment is lost when transferring in ParentQueue#assignContainers

2019-05-24 Thread Tao Yang (JIRA)
Tao Yang created YARN-9580:
--

 Summary: Fulfilled reservation information in assignment is lost 
when transferring in ParentQueue#assignContainers
 Key: YARN-9580
 URL: https://issues.apache.org/jira/browse/YARN-9580
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Tao Yang
Assignee: Tao Yang


When transferring assignment from child queue to parent queue, fulfilled 
reservation information including fulfilledReservation and 
fulfilledReservedContainer in assignment is lost.

When multi-nodes enabled, this lost can raise a problem that allocation 
proposal is generated but can't be accepted because there is a check for 
fulfilled reservation information in 
FiCaSchedulerApp#commonCheckContainerAllocation, this endless loop will always 
be there and the resource of the node can't be used anymore.

In HB-driven scheduling mode, fulfilled reservation can be allocated via 
another calling stack: CapacityScheduler#allocateContainersToNode -->  
CapacityScheduler#allocateContainerOnSingleNode --> 
CapacityScheduler#allocateFromReservedContainer, in this way assignment can be 
generated by leaf queue and directly submitted, I think that's why we hardly 
find this problem before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9573) DistributedShell can't specify LogAggregationContext

2019-05-24 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847316#comment-16847316
 ] 

Adam Antal commented on YARN-9573:
--

Uploaded patch v1. It supports adding simple pattern to be added to the include 
pattern of the log aggregation context and disables the exclusion pattern. This 
is enough for a DShell to be started with log aggregation of rolling mode.

No test is added though. I'm unsure how can that be added, tons of the options 
are untested as well. (Also 80% of the DShell tests are timing out in my local.)

> DistributedShell can't specify LogAggregationContext
> 
>
> Key: YARN-9573
> URL: https://issues.apache.org/jira/browse/YARN-9573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: distributed-shell, log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9573.001.patch
>
>
> When DShell sends the application request object to the RM, it doesn't 
> specify the LogAggregationContext object - thus it is not possible to run 
> DShell with various log-aggregation configurations, for e.g. a rolling 
> fashioned log aggregation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9573) DistributedShell can't specify LogAggregationContext

2019-05-24 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9573:
-
Attachment: YARN-9573.001.patch

> DistributedShell can't specify LogAggregationContext
> 
>
> Key: YARN-9573
> URL: https://issues.apache.org/jira/browse/YARN-9573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: distributed-shell, log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9573.001.patch
>
>
> When DShell sends the application request object to the RM, it doesn't 
> specify the LogAggregationContext object - thus it is not possible to run 
> DShell with various log-aggregation configurations, for e.g. a rolling 
> fashioned log aggregation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org