[jira] [Commented] (YARN-9473) [Umbrella] Support Vector Engine ( a new accelerator hardware) based on pluggable device framework

2019-04-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829931#comment-16829931
 ] 

Hudson commented on YARN-9473:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16479 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16479/])
YARN-9476. [YARN-9473] Create unit tests for VE plugin. Contributed by (ztang: 
rev 7fbaa7d66f3ff40b80b70d4563545035e91e44a6)
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/com/nec/TestNECVEPlugin.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/com/nec/NECVEPlugin.java


> [Umbrella] Support Vector Engine ( a new accelerator hardware) based on 
> pluggable device framework
> --
>
> Key: YARN-9473
> URL: https://issues.apache.org/jira/browse/YARN-9473
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Zhankun Tang
>Assignee: Peter Bacsko
>Priority: Major
>
> As the heterogeneous computation trend rises, new acceleration hardware like 
> GPU, FPGA is used to satisfy various requirements.
> And a new hardware Vector Engine (VE) which released by NEC company is 
> another example. The VE is like GPU but has different characteristics. It's 
> suitable for machine learning and HPC due to better memory bandwidth and no 
> PCIe bottleneck.
> Please Check here for more VE details:
> [https://www.nextplatform.com/2017/11/22/deep-dive-necs-aurora-vector-engine/]
> [https://www.hotchips.org/hc30/2conf/2.14_NEC_vector_NEC_SXAurora_TSUBASA_HotChips30_finalb.pdf]
> As we know, YARN-8851 is a pluggable device framework which provides an easy 
> way to develop a plugin for such new accelerators. This JIRA proposes to 
> develop a new VE plugin based on that framework and be implemented as current 
> GPU's "NvidiaGPUPluginForRuntimeV2" plugin.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9476) Create unit tests for VE plugin

2019-04-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829930#comment-16829930
 ] 

Hudson commented on YARN-9476:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16479 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16479/])
YARN-9476. [YARN-9473] Create unit tests for VE plugin. Contributed by (ztang: 
rev 7fbaa7d66f3ff40b80b70d4563545035e91e44a6)
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/com/nec/TestNECVEPlugin.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/com/nec/NECVEPlugin.java


> Create unit tests for VE plugin
> ---
>
> Key: YARN-9476
> URL: https://issues.apache.org/jira/browse/YARN-9476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9476-001.patch, YARN-9476-002.patch, 
> YARN-9476-003.patch, YARN-9476-004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart

2019-04-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829927#comment-16829927
 ] 

Hadoop QA commented on YARN-9510:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  5s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 14 new + 241 unchanged - 0 fixed = 255 total (was 241) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 49s{color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
37s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
44s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
31s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 25m 46s{color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}125m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | TEST-TestYarnConfigurationFields |
|   | hadoop.yarn.client.cli.TestRMAdminCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce 

[jira] [Commented] (YARN-9476) Create unit tests for VE plugin

2019-04-29 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829917#comment-16829917
 ] 

Zhankun Tang commented on YARN-9476:


[~snemeth] Thanks for the review! [~pbacsko] Thanks for the patch!
+1. The patch is committed.

> Create unit tests for VE plugin
> ---
>
> Key: YARN-9476
> URL: https://issues.apache.org/jira/browse/YARN-9476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9476-001.patch, YARN-9476-002.patch, 
> YARN-9476-003.patch, YARN-9476-004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart

2019-04-29 Thread Shen Yinjie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie updated YARN-9510:
--
Attachment: YARN-9510_1.patch

> Proxyuser access timeline and getdelegationtoken failed without Timeline 
> server restart
> ---
>
> Key: YARN-9510
> URL: https://issues.apache.org/jira/browse/YARN-9510
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Affects Versions: 3.1.0
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Major
> Attachments: YARN-9510_1.patch
>
>
> We add a proxyuser by changing "hadoop.proxyuser.xx.yy",if without  restart 
> timeline server.YARN job will fail and throws :
> {code:java}
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Authentication failed, URL: 
> http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa,
>  status: 403, message: Forbidden
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401)
>   at 
> org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213)
> {code}
> Seems that proxyuser info in timeline server has not been refreshed.
> In production cluster, we sometimes add a new proxy user during runtime, and 
> expect that impersonation takes effect after execute a command like 
> "...refreshSuperUserGroupsConfiguration", without restart timeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options

2019-04-29 Thread Sudhir Babu Pothineni (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829704#comment-16829704
 ] 

Sudhir Babu Pothineni edited comment on YARN-9520 at 4/29/19 8:54 PM:
--

Lets say 100 applications in the queue A, 10 applications are running occupied 
100% of the cluster, In my case they should keep running even after Fairshare 
timeout, only after as soon as the containers finished from running jobs, 
remaining jobs should be allocated. But I think these running jobs are 
preempted by waiting jobs after Fair share timeout, Preemption enabled because 
Queue B or C can be active any time. If I put maximum applications per queue 
10, cluster is under utilized. 

Capacity scheduler has inter-queue-preemption.enabled, 
intra-queue-preemption.enabled, is there any specific reason they are not there 
in fair scheduler?

 


was (Author: sbpothineni):
Lets say 100 applications in the queue A, 10 applications are running occupied 
100% of the cluster, In my case they should keep running even after Fairshare 
timeout, only after as soon as the containers finished from running jobs, 
remaining jobs should be allocated. But I think these running jobs are 
preempted by waiting jobs after Fair share timeout, Preemption enabled because 
Queue B or C can be active any time. If I put maximum applications per queue 
10, cluster is under utilized. 

Capacity scheduler has inter-queue-preemption.enabled, 
intra-queue-preemption.enabled, is there any specifica reason they are not 
there in fair scheduler?

 

> fair scheduler: inter-queue-preemption.enabled, 
> intra-queue-preemption.enabled options
> --
>
> Key: YARN-9520
> URL: https://issues.apache.org/jira/browse/YARN-9520
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sudhir Babu Pothineni
>Priority: Major
>
> Its good to have  inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options for fair scheduler, i have a use case 
> where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options

2019-04-29 Thread Sudhir Babu Pothineni (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829704#comment-16829704
 ] 

Sudhir Babu Pothineni commented on YARN-9520:
-

Lets say 100 applications in the queue A, 10 applications are running occupied 
100% of the cluster, In my case they should keep running even after Fairshare 
timeout, only after as soon as the containers finished from running jobs, 
remaining jobs should be allocated. But I think these running jobs are 
preempted by waiting jobs after Fair share timeout, Preemption enabled because 
Queue B or C can be active any time. If I put maximum applications per queue 
10, cluster is under utilized. 

Capacity scheduler has inter-queue-preemption.enabled, 
intra-queue-preemption.enabled, is there any specifica reason they are not 
there in fair scheduler?

 

> fair scheduler: inter-queue-preemption.enabled, 
> intra-queue-preemption.enabled options
> --
>
> Key: YARN-9520
> URL: https://issues.apache.org/jira/browse/YARN-9520
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sudhir Babu Pothineni
>Priority: Major
>
> Its good to have  inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options for fair scheduler, i have a use case 
> where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options

2019-04-29 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829685#comment-16829685
 ] 

Yufei Gu commented on YARN-9520:


Could you elaborate the user case?

> fair scheduler: inter-queue-preemption.enabled, 
> intra-queue-preemption.enabled options
> --
>
> Key: YARN-9520
> URL: https://issues.apache.org/jira/browse/YARN-9520
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sudhir Babu Pothineni
>Priority: Major
>
> Its good to have  inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options for fair scheduler, i have a use case 
> where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options

2019-04-29 Thread Sudhir Babu Pothineni (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudhir Babu Pothineni updated YARN-9520:

Summary: fair scheduler: inter-queue-preemption.enabled, 
intra-queue-preemption.enabled options  (was: fair scheduler: 
inter-queue-preemption-enabled, intra-queue-preemption-enabled options)

> fair scheduler: inter-queue-preemption.enabled, 
> intra-queue-preemption.enabled options
> --
>
> Key: YARN-9520
> URL: https://issues.apache.org/jira/browse/YARN-9520
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sudhir Babu Pothineni
>Priority: Major
>
> Its good to have  inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options for fair scheduler, i have a use case 
> where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9520) fair scheduler: inter-queue-preemption-enabled, intra-queue-preemption-enabled options

2019-04-29 Thread Sudhir Babu Pothineni (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudhir Babu Pothineni updated YARN-9520:

Summary: fair scheduler: inter-queue-preemption-enabled, 
intra-queue-preemption-enabled options  (was: fair scheduler: 
inter-queue-preemption-enabled, intra-queue-preemption-eblaned  options)

> fair scheduler: inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options
> --
>
> Key: YARN-9520
> URL: https://issues.apache.org/jira/browse/YARN-9520
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sudhir Babu Pothineni
>Priority: Major
>
> Its good to have  inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options for fair scheduler, i have a use case 
> where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9520) fair scheduler: inter-queue-preemption-enabled, intra-queue-preemption-eblaned options

2019-04-29 Thread Sudhir Babu Pothineni (JIRA)
Sudhir Babu Pothineni created YARN-9520:
---

 Summary: fair scheduler: inter-queue-preemption-enabled, 
intra-queue-preemption-eblaned  options
 Key: YARN-9520
 URL: https://issues.apache.org/jira/browse/YARN-9520
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Sudhir Babu Pothineni


Its good to have  inter-queue-preemption-enabled, 
intra-queue-preemption-enabled options for fair scheduler, i have a use case 
where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Tan, Wangda (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829570#comment-16829570
 ] 

Tan, Wangda commented on YARN-9517:
---

[~shurong.mai], thanks for putting a patch. However, I'm not sure why you 
closed this Jira? Is the patch or fix already in mentioned branches?

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: patch
> Attachments: YARN-9517.patch
>
>
> yarn-site.xml
> {code:java}
> 
> yarn.log-aggregation-enable
> false
> 
> {code}
>  
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-29 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829328#comment-16829328
 ] 

Jim Brennan commented on YARN-9518:
---

[~shurong.mai], are you running with the latest code (trunk)?   The patch you 
put up looks like it is based on a version of CgroupsLCEResourcesHandler() from 
before 5/19/2017 (YARN-5301).

Can you verify the problem exists in trunk?

 

 

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> 

[jira] [Commented] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config

2019-04-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829306#comment-16829306
 ] 

Hadoop QA commented on YARN-9519:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
50s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9519 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967377/YARN-9519.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 258c4f499640 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1cef194 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24029/testReport/ |
| Max. process+thread count | 455 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24029/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> TFile log 

[jira] [Updated] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config

2019-04-29 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9519:
-
Attachment: YARN-9519.001.patch

> TFile log aggregation file format is insensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config
> 
>
> Key: YARN-9519
> URL: https://issues.apache.org/jira/browse/YARN-9519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9519.001.patch
>
>
> The TFile log aggregation file format is not sensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config.
> In {{LogAggregationTFileController$initInternal}}:
> {code:java}
> this.remoteRootLogDir = new Path(
> conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
> YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR));
> {code}
> So the remoteRootLogDir is only aware of the 
> yarn.nodemanager.remote-app-log-dir config, while other file format, like 
> IFile defaults to the file format config, so its priority is bigger.
> From {{LogAggregationIndexedFileController$initInternal}}:
> {code:java}
> String remoteDirStr = String.format(
> YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT,
> this.fileControllerName);
> String remoteDir = conf.get(remoteDirStr);
> if (remoteDir == null || remoteDir.isEmpty()) {
>   remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
>   YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR);
> }
> {code}
> (Where these configs are: )
> {code:java}
> public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT
>   = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir";
> public static final String NM_REMOTE_APP_LOG_DIR = 
> NM_PREFIX + "remote-app-log-dir";
> {code}
> I suggest TFile should try to obtain the remote dir config from 
> yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not 
> specified falls back to the yarn.nodemanager.remote-app-log-dir config.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Summary: can't use CGroups with YARN in centos7   (was: can not use CGroups 
with YARN in centos7 )

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  2019-04-19 20:17:20,108 INFO 
> 

[jira] [Commented] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829233#comment-16829233
 ] 

Shurong Mai commented on YARN-9518:
---

hi, [~adam.antal], I have completed the description of this issue and submit a 
patch, please review.

> can not use CGroups with YARN in centos7 
> -
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
The os version is centos7.

 

When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

The patch is universally applicable to cgroup subsystem paths, such as cgroup 
network subsystem as follows:  
{code:java}
/sys/fs/cgroup/net_cls -> net_cls,net_prio
/sys/fs/cgroup/net_prio -> net_cls,net_prio
/sys/fs/cgroup/net_cls,net_prio{code}
 

 

##
{panel:title=exceptional nodemanager logs:}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
The os version is centos7.

 

When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

The patch is universally applicable to cgroup subsystem paths, such as cgroup 
network subsystem as follows:  
{code:java}
/sys/fs/cgroup/net_cls -> net_cls,net_prio
/sys/fs/cgroup/net_prio -> net_cls,net_prio
/sys/fs/cgroup/net_cls,net_prio{code}
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
 2019-04-19 20:17:20,109 INFO 

[jira] [Assigned] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config

2019-04-29 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal reassigned YARN-9519:


Assignee: Adam Antal

> TFile log aggregation file format is insensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config
> 
>
> Key: YARN-9519
> URL: https://issues.apache.org/jira/browse/YARN-9519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> The TFile log aggregation file format is not sensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config.
> In {{LogAggregationTFileController$initInternal}}:
> {code:java}
> this.remoteRootLogDir = new Path(
> conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
> YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR));
> {code}
> So the remoteRootLogDir is only aware of the 
> yarn.nodemanager.remote-app-log-dir config, while other file format, like 
> IFile defaults to the file format config, so its priority is bigger.
> From {{LogAggregationIndexedFileController$initInternal}}:
> {code:java}
> String remoteDirStr = String.format(
> YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT,
> this.fileControllerName);
> String remoteDir = conf.get(remoteDirStr);
> if (remoteDir == null || remoteDir.isEmpty()) {
>   remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
>   YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR);
> }
> {code}
> (Where these configs are: )
> {code:java}
> public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT
>   = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir";
> public static final String NM_REMOTE_APP_LOG_DIR = 
> NM_PREFIX + "remote-app-log-dir";
> {code}
> I suggest TFile should try to obtain the remote dir config from 
> yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not 
> specified falls back to the yarn.nodemanager.remote-app-log-dir config.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829224#comment-16829224
 ] 

Hadoop QA commented on YARN-9518:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} YARN-9518 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9518 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967374/YARN-9518.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24028/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> can not use CGroups with YARN in centos7 
> -
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
>  
> {panel:title=exceptional nodemanager logs}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at 

[jira] [Commented] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config

2019-04-29 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829223#comment-16829223
 ] 

Adam Antal commented on YARN-9519:
--

It looks like it is also insensitive to the suffix version of this config:
{code:java}
public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_SUFFIX_FMT
  = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir-suffix";
{code}

> TFile log aggregation file format is insensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config
> 
>
> Key: YARN-9519
> URL: https://issues.apache.org/jira/browse/YARN-9519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
>
> The TFile log aggregation file format is not sensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config.
> In {{LogAggregationTFileController$initInternal}}:
> {code:java}
> this.remoteRootLogDir = new Path(
> conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
> YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR));
> {code}
> So the remoteRootLogDir is only aware of the 
> yarn.nodemanager.remote-app-log-dir config, while other file format, like 
> IFile defaults to the file format config, so its priority is bigger.
> From {{LogAggregationIndexedFileController$initInternal}}:
> {code:java}
> String remoteDirStr = String.format(
> YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT,
> this.fileControllerName);
> String remoteDir = conf.get(remoteDirStr);
> if (remoteDir == null || remoteDir.isEmpty()) {
>   remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
>   YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR);
> }
> {code}
> (Where these configs are: )
> {code:java}
> public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT
>   = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir";
> public static final String NM_REMOTE_APP_LOG_DIR = 
> NM_PREFIX + "remote-app-log-dir";
> {code}
> I suggest TFile should try to obtain the remote dir config from 
> yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not 
> specified falls back to the yarn.nodemanager.remote-app-log-dir config.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
The os version is centos7.

 

When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

 
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows:

 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated as "/sys.fs/cgroup/cpu" 
rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 "

 

 

 
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
 2019-04-19 20:17:20,109 INFO 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

 
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 

[jira] [Created] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config

2019-04-29 Thread Adam Antal (JIRA)
Adam Antal created YARN-9519:


 Summary: TFile log aggregation file format is insensitive to the 
yarn.log-aggregation.TFile.remote-app-log-dir config
 Key: YARN-9519
 URL: https://issues.apache.org/jira/browse/YARN-9519
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 3.2.0
Reporter: Adam Antal


The TFile log aggregation file format is not sensitive to the 
yarn.log-aggregation.TFile.remote-app-log-dir config.

In {{LogAggregationTFileController$initInternal}}:
{code:java}
this.remoteRootLogDir = new Path(
conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR));
{code}
So the remoteRootLogDir is only aware of the 
yarn.nodemanager.remote-app-log-dir config, while other file format, like IFile 
defaults to the file format config, so its priority is bigger.

>From {{LogAggregationIndexedFileController$initInternal}}:
{code:java}
String remoteDirStr = String.format(
YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT,
this.fileControllerName);
String remoteDir = conf.get(remoteDirStr);
if (remoteDir == null || remoteDir.isEmpty()) {
  remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
  YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR);
}
{code}
(Where these configs are: )
{code:java}
public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT
  = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir";
public static final String NM_REMOTE_APP_LOG_DIR = 
NM_PREFIX + "remote-app-log-dir";
{code}
I suggest TFile should try to obtain the remote dir config from 
yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not 
specified falls back to the yarn.nodemanager.remote-app-log-dir config.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9476) Create unit tests for VE plugin

2019-04-29 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829205#comment-16829205
 ] 

Peter Bacsko commented on YARN-9476:


[~sunilg] [~tangzhankun] could you please also review and commit this patch?

> Create unit tests for VE plugin
> ---
>
> Key: YARN-9476
> URL: https://issues.apache.org/jira/browse/YARN-9476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9476-001.patch, YARN-9476-002.patch, 
> YARN-9476-003.patch, YARN-9476-004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9504) [UI2] Fair scheduler queue view page is broken

2019-04-29 Thread Zoltan Siegl (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829200#comment-16829200
 ] 

Zoltan Siegl commented on YARN-9504:


[~sunilg] could you have a look at this?

> [UI2] Fair scheduler queue view page is broken
> --
>
> Key: YARN-9504
> URL: https://issues.apache.org/jira/browse/YARN-9504
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn-ui-v2
>Affects Versions: 3.2.0, 3.3.0, 3.2.1
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: Screenshot 2019-04-23 at 14.52.57.png, Screenshot 
> 2019-04-23 at 14.59.35.png, YARN-9504.001.patch, YARN-9504.002.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> UI2 queue page currently displays white screen for Fair Scheduler.
>  
> In src/main/webapp/app/components/tree-selector.js:377 (getUsedCapacity) code 
> refers to 
> queueData.get("partitionMap") which is null for fair scheduler queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9476) Create unit tests for VE plugin

2019-04-29 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829194#comment-16829194
 ] 

Szilard Nemeth commented on YARN-9476:
--

Hi [~pbacsko]!
1 (non-binding) for the latest patch!


> Create unit tests for VE plugin
> ---
>
> Key: YARN-9476
> URL: https://issues.apache.org/jira/browse/YARN-9476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9476-001.patch, YARN-9476-002.patch, 
> YARN-9476-003.patch, YARN-9476-004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links.  

 

 
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
 2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct"

 
{panel:title=exceptional nodemanager logs}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
4_0042_01_01 and exit code: 27
ExitCodeException exitCode=27:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
at org.apache.hadoop.util.Shell.run(Shell.java:482)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.run(Shell.java:482)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
java.lang.Thread.run(Thread.java:745)
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
2019-04-19 20:17:20,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell output: main 
: command provided 1
2019-04-19 20:17:20,109 INFO 

[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Summary: can not use CGroups with YARN in centos7   (was: cgroup subsystem 
in centos7 )

> can not use CGroups with YARN in centos7 
> -
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829153#comment-16829153
 ] 

Shurong Mai commented on YARN-9518:
---

hi [~adam.antal], thank you for your attention. I am editting this issue, 
please wait for moments.

> can not use CGroups with YARN in centos7 
> -
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Affects Version/s: 3.2.0
   2.9.2
   2.8.5
   2.7.7
   3.1.2

> can not use CGroups with YARN in centos7 
> -
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart

2019-04-29 Thread Shen Yinjie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie updated YARN-9510:
--
Description: 
We add a proxyuser by changing "hadoop.proxyuser.xx.yy",if without  restart 
timeline server.YARN job will fail and throws :
{code:java}
Caused by: 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
Authentication failed, URL: 
http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa,
 status: 403, message: Forbidden
at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401)
at 
org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213)
{code}
Seems that proxyuser info in timeline server has not been refreshed.
In production cluster, we sometimes add a new proxy user during runtime, and 
expect that impersonation takes effect after execute a command like 
"...refreshSuperUserGroupsConfiguration", without restart timeline.

  was:
We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute yarn 
rmadmin -refreshSuperUserGroupsConfiguration but didn't  restart timeline 
server.MR job will fail and throws :

{code:java}
Caused by: 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
Authentication failed, URL: 
http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa,
 status: 403, message: Forbidden
at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401)
at 
org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213)

seems that proxyuser info in timeline server has not been refreshed.
{code}




> Proxyuser access timeline and getdelegationtoken failed without Timeline 
> server restart
> ---
>
> Key: YARN-9510
> URL: https://issues.apache.org/jira/browse/YARN-9510
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Affects Versions: 3.1.0
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Major
>
> We add a proxyuser by changing "hadoop.proxyuser.xx.yy",if without  restart 
> timeline server.YARN job will fail and throws :
> {code:java}
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Authentication failed, URL: 
> http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa,
>  status: 403, message: Forbidden
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401)
>   at 
> org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213)
> {code}
> Seems that proxyuser info in timeline server has not been refreshed.
> In production cluster, we sometimes add a new proxy user during runtime, and 
> expect that impersonation takes effect after execute a command like 
> "...refreshSuperUserGroupsConfiguration", without restart timeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart

2019-04-29 Thread Shen Yinjie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie reassigned YARN-9510:
-

Assignee: Shen Yinjie

> Proxyuser access timeline and getdelegationtoken failed without Timeline 
> server restart
> ---
>
> Key: YARN-9510
> URL: https://issues.apache.org/jira/browse/YARN-9510
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Affects Versions: 3.1.0
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Major
>
> We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute 
> yarn rmadmin -refreshSuperUserGroupsConfiguration but didn't  restart 
> timeline server.MR job will fail and throws :
> {code:java}
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Authentication failed, URL: 
> http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa,
>  status: 403, message: Forbidden
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401)
>   at 
> org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213)
> seems that proxyuser info in timeline server has not been refreshed.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9518) cgroup subsystem in centos7

2019-04-29 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829064#comment-16829064
 ] 

Adam Antal commented on YARN-9518:
--

Hi [~shurong.mai], could you please provide a description about the issue? What 
is the bug/what do you want to achieve?

> cgroup subsystem in centos7 
> 
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart

2019-04-29 Thread Shen Yinjie (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829039#comment-16829039
 ] 

Shen Yinjie commented on YARN-9510:
---

Propose to implement RefreshUserMappingsProtocol in 
ApplicationHistoryClientService,I'll upload a draft patch soon.

> Proxyuser access timeline and getdelegationtoken failed without Timeline 
> server restart
> ---
>
> Key: YARN-9510
> URL: https://issues.apache.org/jira/browse/YARN-9510
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Affects Versions: 3.1.0
>Reporter: Shen Yinjie
>Priority: Major
>
> We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute 
> yarn rmadmin -refreshSuperUserGroupsConfiguration but didn't  restart 
> timeline server.MR job will fail and throws :
> {code:java}
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Authentication failed, URL: 
> http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa,
>  status: 403, message: Forbidden
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401)
>   at 
> org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213)
> seems that proxyuser info in timeline server has not been refreshed.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart

2019-04-29 Thread Shen Yinjie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie updated YARN-9510:
--
Description: 
We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute yarn 
rmadmin -refreshSuperUserGroupsConfiguration but didn't  restart timeline 
server.MR job will fail and throws :

{code:java}
Caused by: 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
Authentication failed, URL: 
http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa,
 status: 403, message: Forbidden
at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401)
at 
org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213)

seems that proxyuser info in timeline server has not been refreshed.
{code}



  was:
We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute yarn 
rmadmin -refreshSuperUserGroupsConfiguration but didn't  restart timeline 
server.MR job will fail and throws :
Caused by: 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
Authentication failed, URL: 
http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa,
 status: 403, message: Forbidden
at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401)
at 
org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213)

seems that proxyuser info in timeline server has not been refreshed.



> Proxyuser access timeline and getdelegationtoken failed without Timeline 
> server restart
> ---
>
> Key: YARN-9510
> URL: https://issues.apache.org/jira/browse/YARN-9510
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Affects Versions: 3.1.0
>Reporter: Shen Yinjie
>Priority: Major
>
> We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute 
> yarn rmadmin -refreshSuperUserGroupsConfiguration but didn't  restart 
> timeline server.MR job will fail and throws :
> {code:java}
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Authentication failed, URL: 
> http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa,
>  status: 403, message: Forbidden
>   at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401)
>   at 
> org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
>   at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213)
> seems that proxyuser info in timeline server has not been refreshed.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9518) cgroup subsystem in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Summary: cgroup subsystem in centos7   (was: cgroup in centos7)

> cgroup subsystem in centos7 
> 
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9518) cgroup in centos7

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Priority: Major  (was: Critical)

> cgroup in centos7
> -
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9518) cgroup in centos7

2019-04-29 Thread Shurong Mai (JIRA)
Shurong Mai created YARN-9518:
-

 Summary: cgroup in centos7
 Key: YARN-9518
 URL: https://issues.apache.org/jira/browse/YARN-9518
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Shurong Mai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: 
yarn-site.xml
{code:java}

yarn.log-aggregation-enable
false

{code}
 

When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit a 
patch which is  simple and can apply to this hadoop version.

  was:
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit a 
patch which is  simple and can apply to this hadoop version.


> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: patch
> Attachments: YARN-9517.patch
>
>
> yarn-site.xml
> {code:java}
> 
> yarn.log-aggregation-enable
> false
> 
> {code}
>  
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai reopened YARN-9517:
---

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Labels: patch  (was: )

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: patch
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai resolved YARN-9517.
---
Resolution: Fixed

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: patch
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828982#comment-16828982
 ] 

Shurong Mai edited comment on YARN-9517 at 4/29/19 7:20 AM:


We had applied the patch to our hadoop and test ok 


was (Author: shurong.mai):
We have applied the patch to our hadoop and test ok 

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828982#comment-16828982
 ] 

Shurong Mai edited comment on YARN-9517 at 4/29/19 7:20 AM:


We had applied the patch to our hadoop and test ok.


was (Author: shurong.mai):
We had applied the patch to our hadoop and test ok 

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: 
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit a 
patch which is  simple and can apply to this hadoop version.

  was:
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit a 
patch which can apply to this hadoop version.


> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
> Attachments: YARN-9517.patch
>
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which is  simple and can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Affects Version/s: 2.2.0
   2.3.0
   2.4.1
   2.5.2
   2.6.5
   3.2.0
   2.9.2
   2.8.5

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, 
> 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: 
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit a 
patch which can apply to this hadoop version.

  was:
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x and 3.x.


> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x.y and 3.x.y and I submit 
> a patch which can apply to this hadoop version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: 
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

  was:Aggregation is not enabled, when we click 


> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: 
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;

I also fund this problem in all hadoop version  2.x and 3.x.

  was:
When aggregation is not enabled, we click the "container log link"(in web page 
"http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) 
after a job is finished successfully.

It will jump to the webpage displaying "Aggregation is not enabled. Try the 
nodemanager at yy:48038" after we click, and the url is 
"http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;


> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> When aggregation is not enabled, we click the "container log link"(in web 
> page 
> "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;)
>  after a job is finished successfully.
> It will jump to the webpage displaying "Aggregation is not enabled. Try the 
> nodemanager at yy:48038" after we click, and the url is 
> "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop;
> I also fund this problem in all hadoop version  2.x and 3.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Summary: When aggregation is not enabled, can't see the container log  
(was: When Aggregation is not enabled, can't see the container log)

> When aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> Aggregation is not enabled, when we click 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9517) When Aggregation is not enabled, can't see the container log

2019-04-29 Thread Shurong Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9517:
--
Description: Aggregation is not enabled, when we click 

> When Aggregation is not enabled, can't see the container log
> 
>
> Key: YARN-9517
> URL: https://issues.apache.org/jira/browse/YARN-9517
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.7
>Reporter: Shurong Mai
>Priority: Major
>
> Aggregation is not enabled, when we click 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org